How To Make a Business Case For Process Mining

Fire-fighting should not be the mode in which we change processes

Most organizations have complex processes that are hard to manage and control. Moving out of a fire-fighting mode and stepping back to understand the existing business processes is the necessary starting point for process improvement.

A lot of data is available in IT systems (e.g., CRM, ERP, Workflow, PDM, ITSM, homemade or legacy systems, data warehouses, or even Excel) that contain detailed information about which activities are performed, when, and by whom. What is needed are quick methods to gain insight in this data to understand the underlying processes and take actions.

Process mining can be used to quickly and objectively get a complete picture of a process by automatically analyzing the IT data and visualizing the real process flows that took place.

Many of you already know this and are convinced about the benefits of process mining. But you may still be asked to make a business case before you can start. So, how exactly can you quantify the added value of process mining?

You will need to look at the precise use case and business context in which you want to apply process mining. This article provides several example scenarios in which process mining helps internal process improvement departments. Many of these advantages also apply to other process mining use cases, such as auditing. At the end of the article, we provide you with an ROI template that you can use as a starting point for your own process mining business case.

1. Challenges for Process Improvement departments

A Lean Six Sigma or Process Improvement department, a Change Management team, a Process Excellence function, or a Process Performance or BPM group employs process experts and internal consultants who help different departments in the organization to re-structure and optimize their business processes. In cooperation with the process owners, these teams deliver process improvements that aim to provide long-lasting cost savings and increased revenue for the company.

The starting point for any process improvement project is the so-called ‘As-is’ process analysis, in which the current state and all the deficiencies of the process are mapped out, and improvement opportunities are identified. The traditional way of process discovery is carried out manually through workshops and interviews.

The advantages of process mining are the objective and quick diagnosis of process issues. If you use a traditional method of manual process discovery, you typically have the following problems:

  1. It takes lots of time for the people who do the interviews and process mapping. This is about the time that is spent by the internal consultants. If they could do their work faster, they could do more projects and therefore deliver more value for their organization.

  2. It binds people (interviewees) from productive work into discussions about how things are currently done just to understand the ‘as-is’ process. This is about the time other people (outside the process improvement department) must spend on ‘As-is’ process discovery. For example, suppose there is a one-week workshop with one consultant and ten employees from the operations department. In that case, this costs the company the consultant’s time plus 50 FTE days for the participating operations manager and other stakeholders.

  3. The results are subjective, based on what people think about the processes, not necessarily on how the processes really are. If these subjective insights are used as the basis for the improvement project, there is the risk that the applied improvement measures will not be effective (because they do not address the real problems).

  4. Different opinions may not be resolved (political deadlock). In situations where the participating people cannot agree on a unified view of the ‘As-is’ process and its problems, the risk is high that the project will fail, and no process improvement will be carried out at all.

  5. You get only a sample view, not the complete picture. The manual picture will never be complete if you ask people to spell out the process. Like with the risk of a subjective bias, an incomplete picture carries the risk of missing important parts in the process analysis and therefore the risk of implementing the wrong (ineffective or even counterproductive) improvement measures.

  6. Manual tracking and measurement is costly and biased and provides only sample data. This is about the practice of measuring process steps with a stopwatch to collect objective evidence: (1) Doing this manual work is very time-consuming, (2) People who are observed behave differently than normally, and (3) Only a limited sample can be obtained (e.g., tracing 30 cases over a few weeks).

  7. Because the cost of process diagnosis and information collection is so high, it cannot be easily repeated. Therefore, the impact of process improvements is often hard to estimate (“Did we really achieve what we wanted to achieve?”). Without a way to measure the effect of the improvement initiative, the actual value that the project delivered to the organization cannot be measured. There is also the risk that people fall back into old behavioral patterns after some time.

In the next section, we review how process mining can help to address these seven challenges.

2. Benefits of using process mining in process improvement projects

Process mining significantly lowers the cost of understanding the current process by bypassing interviews and extracting the necessary information out of the existing data from the IT systems. This way, you can focus your discussions on ‘why’ the processes are performed the way they are. Furthermore, iterative improvements with continuous assessment of the impact of changes become possible because you can repeat the analysis at any point in time at little cost.

The benefits of process mining with respect to the challenges discussed in the previous section are described in the following table:

Benefits of using process mining in process improvement

3. Ingredients of your business case

While the benefits of process mining are often obvious and it is clear that using process mining will “pay for itself,” you will most likely find yourself in the position of having to justify the investment. How do you do that? You create a business case for your management to approve the purchase.

Ultimately, each business case is unique, but you can follow a number of guidelines to put it together. In this section, we give you a starting point for how you can create a business case for your own process mining projects (see also the ROI template in the next section).

First, you can assemble the investments that you need to make. Think of the following components for a process mining project:

  • Software license cost - The use of the process mining software

  • Training cost - Educating the people who will use process mining about the approach and the software

  • Employee cost - Time investment of the people involved in the project (analysts but also additional stakeholders)

  • Data extraction cost - You may need to pay the IT department (or your external IT provider) to extract or transfer the data

  • Professional services - If you are getting help from an external consultancy to help you with the project, their services need to be incorporated as well

To quantify the return, you can think in two dimensions:

  1. Saving costs: Where will you reduce your current expenses?

    There are two categories of cost savings that you can consider for process mining projects:

    1.1. Cost savings through more efficient As-Is Process Analysis

    Revisit the section ‘Benefits of using process mining in process improvement projects’ above to think through this category of cost savings. Doing at least one small pilot project in your organization (that you can compare with past projects that you had done in “the old way”) will help you get some hard numbers you can use for your business case.

    For example:

    • Time reduction for process analyst: How much faster was the process discovery for the analyst? How much effort did it take before to understand the current processes and collect and analyze data manually?

    • Less time for subject matter experts in the As-Is workshops: How much time have you saved for these stakeholders in the workshops by focusing on the ‘Why’ rather than the ‘What’?

    • Avoided risk of focusing improvement activities in the wrong area: It is easy to waste process improvement resources by focusing on the wrong places or processes. If you have seen project failures in the past, can you quantify their loss in proportion to the risk and include this component in your business case?

    1.2. Expected cost savings through the process improvements themselves

    If you make your business case for the overall process improvement project (not just the use of process mining in your current process improvement activities), include the potential cost savings from the improved processes. This is usually difficult to quantify beforehand because you do not yet know how much improvement potential you will find.

    Also, here, it can help to first do a quick scan of your process to get some hard numbers, if possible. Alternatively, you may be able to draw upon your experience from past process improvement projects or your domain expertise about the process and the potential value of improving it just a little bit (especially for high-volume or high-value processes, small improvements can have a big impact).

    The expected savings will then be quantified in a unit that makes sense for your particular process.

    For example:

    • Fewer steps: How many steps are needed to complete one case in your process? Your process mining analysis can often determine an ideal range of steps. Cases that take more steps indicate rework. Your improvement project will focus on finding the root causes and avoiding this rework in the future.

      You can then determine the improvement potential based on the process’s volume. For example, for the incident management process at a bank, the cost of one step was estimated to be around 20 Euros. If you can save 200,000 steps within one year, this cost reduction is 4 million Euros.

    • Reducing total activity time: In processes where the time of activities incurs significant costs, reducing their duration can amount to cost savings. For example, in call centers, one minute of an agent being on the phone with a customer is often quantified with 1 Euro in costs.

    • Avoided penalties: In some processes, missing specified Service Level Agreements (SLAs), for example for how fast the service is delivered, leads to penalties that need to be paid. If improving the process leads to a speed-up that helps to meet the contract SLA for more cases, then these penalty payments can be reduced.

    • Improved quality: If poor quality of the delivered product or service leads to the rejection by the customer, then all the steps that were taken to deliver this service or product were essentially wasted. Clear processes with quality assurance measures on the way, and the adherence to these process guidelines (i.e., compliant processes), can ensure that services are delivered more consistently. If your process mining analysis helps to reduce the variation in the process and improves compliance, then the reduced rejection rates can be quantified in terms of less waste (i.e., cost savings).

    • Fewer people: If, by improving the process’s efficiency, fewer people can handle the same amount of work, then their salaries can be used to quantify the cost savings.

  2. Increasing revenue: What will you gain that you did not have before?

    If you can tie your process improvements to additional revenue, your business case should state these expected gains and put them in perspective with the investment.

    For example:

    • More orders processed: In a telecom process, almost half of the customer orders were lost because the ordering process did not work well. Solving these process problems increases the revenue in the value of these otherwise lost orders.

    • Happier customers leading to more business: In customer service processes, often the focus is not so much on saving costs but on making the customers happy and, therefore, keeping them as a customer and having them recommend the company to their friends and colleagues. Their process improvement initiatives target the increase of customer satisfaction measures, such as the Net Promotor Score (NPS), which are then used as a proxy to estimate the resulting increase in revenue.

    • More payments: In processes that deliver services to their customers based on contract SLAs, it can be the case that the customer only pays for the service if it was delivered on time. If the process efficiency can be improved in such a way that more cases meet the SLA, then this leads to an increase in revenue.

    • Sustaining improvements: One of the challenges discussed above was that traditional process improvement projects have trouble maintaining the effects of their process improvements over time. If people slide back into old patterns, then the gains from these process improvement initiatives are lost. Process mining can help to verify and monitor the effectiveness of past process improvements to sustain the improvements.

Finally, even if they are harder to quantify, make sure to mention “softer” benefits such as being in control, being proactive, etc., and also think about whether you can tie your proposal to your company’s strategic goals. For example, many organizations place importance on digital transformation in their strategic agendas for the next five years.

4. ROI template

You can create an excel sheet, where you list the investment on the one side, and the expected cost savings and revenue increase on the other side, to show how long it would take to recoup the investment over time.

For example, if a total investment of 100,000 Euros will lead to an expected revenue increase of 20,000 Euros per month, your business case shows that the investment has paid for itself within five months. After that, every month will continue to deliver an additional 20,000 Euros for the business.

To make this calculation easier, we have created an ROI template that you can use as a starting point:

Directly download ROI Business Case Template in PDF format (contact us for the Excel and Numbers versions)

ROI Business Case Template

When you look at the template, you will see that it follows the same structure as this article. However, it is more detailed and provides examples for process improvement roles, auditors and controllers, IT departments, and process owners. Furthermore, it includes examples of soft benefits and further references.

We are curious if you get to use the ROI template and if you have any additions or feedback about it. Let us know, and if you find it useful, share it with your colleagues!

5. Process Mining Café

Business cases can be tricky. What do you include in your calculation, and what do you leave out? What is a realistic assessment, and what is just wishful thinking?

In this month’s Process Mining Café next week on 17 April, we will look at a concrete process mining project and discuss its business case in more detail. We would love for you to join us and share your own process mining business cases.

Keep an eye on this blog or sign up at our café mailing list to be notified about this Process Mining Café and all future café editions.

Discovering Process Flows For The Bus Lines in Montevideo

In last week’s Process Mining Café, Andrea Delgado and Daniel Calegari from the Universidad de la República, Uruguay, showed us how they used process mining on open data for an urban mobility project.

Before the session, they had already shared a detailed description along with all the scripts to reproduce their approach. We had a lively discussion in the café and you are invited to continue the conversation, whether per email, on LinkedIn, or with Andrea and Daniel directly.

You can now watch the recording here if you missed the live broadcast or want to re-watch the café. Thanks to Puttwaldo, Scott, and Miguel for the discussion, and a big thanks to Andrea, Daniel, and all of you for joining us!

Here are the links that we mentioned during the session:

Contact us anytime at cafe@fluxicon.com if you have questions or suggestions for the café.


Have you seen that the Process Mining Café is also available as a podcast? So, if you prefer to listen to our episodes in your favorite podcast player, you can get them all here.

Sign up for our café mailing list and the YouTube playlist, follow Fluxicon on LinkedIn, or add the café calendar to never miss a Process Mining Café in the future.

Workshops At Process Mining Camp 2024

Hands-on practice is at the very heart of this year’s Process Mining Camp on 13 and 14 June. Above, you can watch a short video we recorded to tell you more about the workshops on the second day of camp.

To help you advance your process mining skills, you will get to work on real-life data sets. You will be exposed to the same challenges that you typically also encounter in your own process mining projects, supported by our expert guidance. You will participate in three workshops that fit together perfectly, with each of them focusing on different skills that you need to be successful.

Sign up for a process mining practice that you can get nowhere else. If you buy your ticket before the end of Friday, you can still benefit from the early bird rate!

Here are some further details about the three workshops on the second day of camp:

Workshop 1 · Discovery and Analysis

There is a certain magic in watching an experienced process miner approach a new data set. While they apply standard practices like identifying incomplete cases, they also follow their intuition and explore layers of the process in an iteration of discovery, questions, and analysis cycles.

Sure, it takes domain knowledge to identify the standard process and its deviations. But it is also a question of experience to know what to look out for, which questions to ask, and how to break up the “spaghetti processes” — even if you are not familiar with the process at all.

In this workshop, you are going to practice on a real-life data set. Together, we will discover and analyze this data set in multiple phases. We show you standard practices and discuss the results in each phase before going further. There will be time to apply the same approach to multiple data sets, and you can even bring your own data, if you like!

Workshop 2 · Data Skills

It is important to accept that your process mining data will not be perfect when you start analyzing it. After all, the vast majority of IT systems were not created with process mining in mind. Recognizing these limitations should not discourage you. Instead, embrace it as an opportunity for getting creative! Learn to navigate the available data and its possibilities.

In this workshop, you are going beyond the fundamentals of process mining, typically centered around case IDs, timestamps, and activity logs. More than just talking about these concepts, you are going to experience them together to gain a better understanding of the associated challenges and explore potential remedies.

To achieve this, you will immerse yourself in a practical case. In multiple steps, you will identify and address issues like faulty and missing data. You will explore techniques for combining event logs from different systems. And you will be stretching the boundaries of the data beyond their original design. We’ll also be discussing what the logs of the future might look like, and what steps you can take today to achieve that vision.

Workshop 3 · Organizational Best Practices

How do you translate process mining insights into changes in your organization? How do you make sure that the improvements stick? And how do you develop your process mining projects into a routine and general practice? The challenges for process mining are not only about technical skills. You also need to take strategic and cultural factors into account.

There is no simple recipe for what makes process mining successful. But there are a lot of different lessons that can be derived from both successes and from setbacks. By now, we have all been talking about these challenges and our experiences for two days — in the discussion groups, over dinner and drinks, in the breaks, during the workshops. In this last workshop, we take a step back and collect what we have learned.

Together, we are going to make this really practical. We will list what works, what doesn’t work and in which situation — and we put it into perspective. You will leave camp with new approaches and patterns that you can apply right away in your future process mining projects.

— We can’t wait to see you all in Eindhoven!

Process Mining Café 30: Spatial Processes

Process Mining Café 30

In the next Process Mining Café tomorrow, we have invited Andrea Delgado and Daniel Calegari from the Universidad de la República, Uruguay, to talk about their process mining case study for urban mobility.

They discover process flows for the bus lines in Montevideo using open data from the city. They then export the XML process maps from Disco to display the discovered processes in the spatial city map. Andrea and Daniel share their scripts and will give you all the information that you need to reproduce their steps.

How could you do a similar analysis for your own city? Join us tomorrow, Wednesday, 20 March, at 15:00 CET to find out! (Check your timezone here). As always, no registration is required. Simply point your browser to fluxicon.com/cafe when it is time. You can watch the café and join the discussion while we are on the air, right there on the café website.

Sign up for the café mailing list here to receive a reminder one hour before the session starts.


Tune in live for Process Mining Café by visiting fluxicon.com/cafe on Wednesday, 20 March 2024, at 15:00 CET! Add the time to your calendar if you don’t want to miss it.

How To Bring Location Into Process Maps For Traffic Processes

STM system elements

This is a guest article by Andrea Delgado, Daniel Calegari, and Nicolás Carignani from the Computer Science Institute, School of Engineering, Universidad de la República, Uruguay. They have used open data to discover the process flows from bus lines in Montevideo. They then exported the XML process maps from Disco to display the discovered processes in the spatial city map. In this article, they explain their approach and give you all their scripts and the information that you need to reproduce the approach for your own city.

If you have a guest article or process mining case study that you would like to share, please get in touch with us via anne@fluxicon.com.

Urban mobility allows us to access housing, jobs, and urban services, and its planning, control, and analysis are of utmost importance to city governments. Transport modeling and planning must address several challenges, such as traffic congestion, public transport crowding, pedestrian difficulties, and atmospheric pollution. Smart cities use technology to improve urban services, e.g., Intelligent Transportation Systems that incorporate technologies generating real-time data that can be processed to extract valuable information.

In 2010, the Municipality of Montevideo, Uruguay, defined a Mobility Plan, which included the creation of the Metropolitan Transportation System (Sistema de Transporte Metropolitano, STM), serving a population of 1.4 million. The STM defines several city public transportation elements, adding an ITS with on-board GPS unit control systems on each bus and smart cards for citizens, as illustrated in Figure 1. The Mobility Management Center manages and controls traffic and transportation in the city using the real-time data provided by the STM.

STM system elements

Fig. 1: STM system elements: smart cards for citizens, buses with GPS, and smart card readers that register data for each trip.

We address a set of questions defined by the Municipality of Montevideo in the context of a joint research project concerning their transportation system (STM) data from buses and smart card trips by applying process mining techniques. For this, we use open data provided by the STM within the governmental open data catalog1. We provided another view on public transportation data to business experts to help them make decisions on the city’s urban mobility, considering different perspectives, including the behavior of bus lines according to the planned routes [1].

People working in the urban mobility domain are used to analyze data in traditional formats (tables, graphics, etc.) and locate geospatial data in the city map. In our first attempt, we analyzed the reference process model corresponding to the bus routes and the data from the actual trips made by passengers within the buses using a traditional process mining approach. However, we realized that the results could be deployed over Montevideo´s city map to improve visualization and understanding of the outcomes.

In what follows, we present an extract of the behavior of bus lines analysis presented in [1], focusing on deploying the bus lines process models provided by Disco over Montevideo´s city map using the open-source geospatial software QGis2.

Open Data from the STM system

Below, we present some definitions of the Metropolitan Transport System (STM) that are needed to understand the data we used for the analysis:

  • A line is a public name by which a set of routes of a transportation company is known, e.g., 2, 103, 148, 306, 405, 526, D11.
  • Sub-lines are each one of the routes that a line has (difference in the streets traveled), with a direction.
  • A variant is each instance of the route of a sub-line, with a specific origin, destination, and direction (maximal, partial, circular)
  • Every variant (line, sub-line, direction, origin, destination) has a set of frequencies that defines a departure time on a given day (working days, Saturdays, Sundays, and holidays).
  • Stops in the route of each variant are known, as well as the stop of origin and destination, all identified by a unique code.

The theoretical schedule by which a bus travels through each stop is also known for each frequency. Apart from the bus data, the information on passengers’ STM card payments is also known. There are different types of passengers, e.g., ordinary users, retirees, and students. Every time a payment is recorded, there is a registry with the kind of passenger, the stop at which it boards, the variant and frequency to which the bus belongs, and the time of departure, among other information. Below, we present the primary datasets we used for the reference process models of the bus line routes and the process model of actual passengers’ trips we show in the map of Montevideo, extracted from [1].

[UBS] Urban bus schedules, by stop3. It contains the bus schedules for urban transportation. These are estimated theoretical schedules in which a bus line will pass through a particular stop along its route. These data are obtained by estimating each stop’s passing times according to the transport units’ average speed, predefined schedule, and distance between stops.

  • type of day: 1 - working days (Monday to Friday), 2- Saturdays, 3 - Sundays and holidays.
  • cod variant: identifies the variant of the line.
  • frequency: identifies the frequency for the variant for the type of day, i.e., the specific hours.
  • cod loc stop: shows the code of each stop
  • ordinal: shows the number of stops within the bus line route, i.e., first (1), second (2), etc.
  • hour: shows the estimated hour for the bus to arrive at the stop
  • day before: indicates if the frequency started the day before (for late-night lines)

The UBS file contains data from more than 100 bus lines with more than 800 variants and 1000 corresponding frequencies. Four companies provide the service and cover the metropolitan area with more than 4700 stops.

[TSB] Trips made on STM buses4. It contains all the trips made on the urban collective transport lines in Montevideo by each operating company, line, variant, day and time, tickets sold at all stops in the system, by type of user, payment method, and sections of each trip. The information comes from all the records processed by the STM trip validation machines.

  • id trip: identifies the trip within the system; in combined tickets, the id trip is repeated in all
  • buses the user rides while the ticket is valid.
  • line code: shows the code of the bus line
  • cod variant: identifies the variant of the line
  • frequency: identifies the frequency for the variant
  • with card: shows if the STM card was used in the payment of the trip
  • date event: timestamp of date and hour of the trip
  • trip type: combined (1 hour, 2 hours), student, retired, standard user.
  • origin stop code: shows the code of the stop at which the user got on the bus

The TSB file we used to analyze the STM trips corresponds to May 2022, which contains 25 million records, including all trips from the STM system registered within the 136 bus lines of Montevideo. For experimentation, we selected different bus lines from these to maximize the city coverage [1].

[BLOD]5 Bus lines, origin, and destination. It contains geographic information that, among other data, includes the lines’ description, origin, and destination. This dataset contains the shapefile6 that provides geospatial data for the bus line destinations for each bus line, sub-line, and variant.

[CTSC]7 Buses: stops and checkpoints. It contains geographic information on stops and control points. This dataset contains the shapefile with geospatial data for the stops defined within the city, referenced in the bus lines routes.

Process mining applied to urban mobility

From the perspective of the behavior of bus lines according to the planned routes, we analyzed the data to answer some of the business experts’ questions, in particular, the ones related to bus lines and their use by passengers:

  1. What is the route of a bus line?
  2. How is the mobility of people within the STM?

Regarding the reference model of the bus lines (question 1), we can obtain it from the UBS file that contains the theoretical schedules. We select variants for each line, setting the variant and frequency codes and day as {case ID}, the stop code as {activity ID}, the hour of the stop as {timestamp}, and the rest of the data fields as {attributes} of the activity. Since variants have different directions corresponding to the sub-line, we obtain two sequential sections in each direction. The reference model must contain the same stops in the same order as the ones defined by the variant in the STM system, which is accessible from the website for a scheduled consultation.

Regarding users’ data on authentic trips of the bus lines (question 2), we can discover a travel model for a bus line from the TSB file. We filter by line and selected the variant, frequency, and day of the trip as {case ID}, the stop code as {activity ID}, the hour of each trip of each stop as {timestamp}, and the rest of the data fields as {attributes} of the activity. The number of events at each stop is the number of passengers boarding. Thus, loops represent multiple passenger boardings at the same stop. Unlike the reference model, the process is not linear since, in some frequencies, there are stops in which nobody boards. We can filter data by days (business days, Saturdays, Sundays, and holidays), day hours, seasons, type of user, company, etc.

Figure 2 (extracted from [1]) depicts examples of these two process models discovered in Disco for bus line 125, variant 667, which is the maximal variant in one direction. This bus line goes in one direction from the old city to the West, ending in Cerro Beach and returning in the other direction.

In this Figure, we can see: a) the bus line in the map taken from the STM schedules site8, b) an excerpt of the reference model (i.e., bus stops included in the line), and c) an excerpt of the STM trips within its frequencies.

Bus line 125 route

Fig. 2: Bus line 125 route from [1] with stops for the 667 variant: a) in the STM website bus schedule, b) excerpt of the reference model, and c) excerpt of the STM trips within its frequencies

From users’ data on actual trips of the bus lines (question 2), it could be possible to analyze load metrics after filtering and discovering the model. For example, it is possible to answer: At which stops do people not get on? At which stops do more people get on? At what times does the bus have more and fewer passengers? What happens to those lines in a particular area? Moreover, it is possible to answer performance questions such as: What is the total duration of the frequencies (on average, maximum, minimum)? Are there frequencies with delays? As an example, we present in Figure 3 the performance model for bus line 300 (one of its variants), which goes from the bottom of the city near the city center to the city’s north periphery, showing the delays in the transitions between stops of the bus line route, calculated with the actual trips data.

Bus line 300 route

Fig. 3: Bus line 300 route with stops for one of its variants: a) in the STM website bus schedule, b) and c) excerpt of performance models with delays in the transitions from the STM trips data.

Exporting from Disco and visualizing the models over Montevideo´s city map

As mentioned before, people working in the urban mobility domain are used to seeing locations on the city map, analyzing data, and considering geospatial data. Although the process models provided valuable information, they should be compared manually to the bus line route provided over the STM site’s city map to identify where some behaviors arise. To improve visualization for business experts, we developed a prototype to deploy process models over Montevideo’s city map.

Disco allows the export of XML files of the resulting process models, both the process map and the performance view. As we selected the bus line stops as activities of the process, which have a corresponding geospatial location over Montevideo city, we can locate each stop in the map and, correspondingly, the Disco process model we exported. Also, the dataset containing the data for each bus line route destination is used to locate the reference bus line route over the map. These data correspond to BLOD and CTSC files with shapefiles geospatial data of the bus line routes and stops in Montevideo.

We used the open-source geospatial software QGIS, which supports loading city maps with corresponding coordinates worldwide, creating/adding layers with geospatial data in the shapefiles to be seen over the map, and developing scripts with Python to work with them. In Figure 4, we present the process flow for deploying the process models exported from Disco over the city map loaded in QGis with the help of the scripts we developed.

Process for deploying process models exported from Disco over the city map in QGis

Fig. 4: Process for deploying process models exported from Disco over the city map in QGis.

The first step is to load the event log into Disco, in our case, the data corresponding to the passengers’ actual trips over a bus line, to discover the corresponding process model of the bus line route, which can be filtered by variant as shown above. Then, the resulting process model can be exported in XML format, which will be one of the inputs to QGis to show over the city map. We want to show both the process map with the flow of the actual trips over the bus line and the performance map with the delays of the buses on the actual trips, as depicted in Figures 2 and 3. In Fig. 5 a screenshot of the export model XML we used in Disco is shown.

QGis software with Montevideo city map

Fig. 5: Disco export model in XML format

In QGis, we first load the shapefile data of the stops and the origin and destination of buses over the city map as vector layers and then the city map layer as the XYZ layer. These layers should be in the same coordinate format and time zone to match locations. We load the Python scripts we developed in the project options. Figure 6 shows a QGis screenshot with the three layers in the bottom left panel and the Python scripts in the right panel.

QGis software with Montevideo city map layer

Fig. 6: QGis software with Montevideo city map layer (XYZ), bus stops over the city (from CTSC shapefile), and bus lines origin and destination over stops (from BLOD shapefile)

The Python scripts we developed transform the Disco process models into a QGis layer that can be deployed over the city map, using as reference the stops and the bus line origin and destination geospatial data we first loaded over the city map. The QGis scripts are as follows:

  • layer_from_disco_model_flow.py: transforms a Disco process model flow (XML) into a QGIS layer, which allows visualization over the map.
  • legs_for_variant.py: divides a bus line into sections (consecutive stops), which is then used as input for the performance process model calculations.
  • layer_from_disco_model_performance.py: transforms a Disco performance process model (XML) into a QGIS layer using the legs_for_variant output layer as input.

To illustrate the process in QGis and its results, we use the bus line 21 route that we show in Figure 7 for the 7488 variant: a) the bus line in the map taken from the STM schedules site, and b) an excerpt of the STM trips within its frequencies, which is the one we import in QGis. This bus line goes in one direction from Independence Square (the entrance of the old city) to the east, parallel to the coast, and ends in the Portones Shopping Center in the Punta Gorda neighborhood. It has extensions of the route to the old city (left) and to the entrance of the Canelones Department (right).

Bus line 21 route with stops for the 7488 variant

Fig. 7: Bus line 21 route with stops for the 7488 variant: a) in the STM website Bus schedule, b) excerpt of the reference model, and c) excerpt of the STM trips within its frequencies

The process model exported from Disco corresponding to Figure 7 c), and transformed into a QGis layer can be seen over the Montevideo city map shown in Figure 8, with zoom over the first sections (left). It can be seen that differently to the ones presented in Figure 7, a) and b), it has arcs that traverse over non-consecutive stops, which reflects, as we mentioned, the actual trips over the bus line route, where in some frequencies, no passengers are getting into the bus in some stops.

Process model exported from Disco deployed over Montevideo city map

Fig. 8: Process model flow view exported from Disco discovered from the STM trips actual data for the 21 bus line route and variant 7488 deployed over Montevideo city map

Zooming again on the upper-right corner flow, we can see the details of the process model flow in Disco and the corresponding process model deployed over the city map, as shown in Figure 9: a) the process model in the city map, and b) the process model flow in Disco. It can be seen that stop 2126 goes over the left in the map, stop 2131 goes over the right, and stop 2128 goes in the middle, with the flow to the right. If the two views of the model are compared in Figure 9 a) and b), the difference is noticeable. The map gives a much greater context, allowing one to quickly know the exact location of the activities (stops in this case), the corresponding area, and the neighborhood, allowing one to relate the generated model to information not included.

Zoom of the process model of the bus line 21

Fig. 9: Zoom of the process model of the bus line 21 route upper-right corner flow and correspondence with the Disco process map

In this case, we could further investigate why, on several occasions, no passengers are getting on the bus at stop 2128. Some causes could be the following: the stop infrastructure is worse than other nearby stops, there are security issues in the area, and it is mainly used by specific groups of users (e.g., students) within input and output hours of the school.

The performance process model exported from Disco corresponding to Figure 7 c), and transformed into a QGis layer can be seen over the Montevideo city map shown in Figure 10: a) the model exported from Disco with delays in transitions between the stops, and b) deployed in the map showing sections delays using the semaphore metaphor: green, yellow and red for more significant delays.

Performance process model for the bus line 21

Fig. 10: Performance process model for the bus line 21 route from the bus actual trips data: a) the model exported from Disco with delays in transitions, and b) deployed in the map showing sections delays

Again, as in Figure 8, being able to visualize the specific sections of the bus line that present the worst times over the bus line route in the map, provides a much greater context, allowing to identify the zone at which the sections belong quickly. In this case, it can be seen that the section to the left that goes from the Centenario stadium (round green space marked in the middle below the bus line) to the old city presents the worst delays. This is consistent with that zone being bustling over Avenue Av. Italia, passing through the Three Crosses bus terminal and traversing from start to end the main center avenue Av. 18 de Julio towards the old city entrance (Independence Square).

Conclusions and reproducibility for other cities

We have presented a case study on the application of process mining to analyze urban mobility open data from the STM transport system in Montevideo, Uruguay. We showed how, using Disco, we can discover the reference process model of bus lines using the stops defined for each one and the actual STM trips process model that shows the actual behavior of bus lines regarding passengers’ use within the different frequencies (actual trips) of the buses.

Due to the interest of business people in this specific domain who normally visualize and analyze data over the city map, we developed a prototype that, using the process model export feature of Disco in XML format, allows us to import them into the open-source geospatial software QGis, deploying them over the city map of Montevideo. This provides much greater context to users, as discussed in the previous section.

Regarding reproducibility for other cities, exporting the process models to Disco, importing the layers for the stops and bus lines routes over the city map, and generating the QGis layers for the process models can be applied straightforwardly. For the scripts to work, the process models to be exported from Disco should use as activities the stops of the bus lines, and the files containing the shapefiles with geospatial data of the stops and bus lines routes should be available. The city map layer is loaded as an XYZ layer directly in QGis, available for all cities. The data and format are available from the bus actual trips system of the desired city (i.e., the records of the passengers’ trips) and should probably have to be manipulated to provide the same fields that we used (e.g., bus line, variant, stops to relate to the shapefiles, etc.), or the scripts can be adapted to the data provided by the system under analysis.

The scripts and data are publicly available9 for further experimentation and analysis.

References

[1] Delgado, A., Calegari, D., Process Mining for Improving Urban Mobility in Smart Cities: Challenges and Application with Open Data, 56th Hawaii International Conference on System Sciences (HICSS-56), Maui, Hawaii, USA, Scholarspace, 2023. https://hdl.handle.net/10125/102846

[2] Rodao, B., Carignani, N., Ferreira, S., Minería de Procesos para el análisis de movilidad urbana. Tesis de grado. Universidad de la República (Uruguay). Facultad de Ingeniería, 2023. (In Spanish) https://hdl.handle.net/20.500.12008/42549

Authors

Andrea Delgado - adelgado@fing.edu.uy - https://www.fing.edu.uy/~adelgado

Daniel Calegari - dcalegar@fing.edu.uy - https://www.fing.edu.uy/~dcalegar

Nicolás Carignani - nicolas.carignani@fing.edu.uy