One of the most interesting use cases for process mining is the understanding of legacy systems. In many cases the developers are long gone when changes to these systems must be made and it becomes a huge burden just to maintain these often mission-critical systems.
Steve Kilner just authored two articles on process mining for legacy systems in the IBM Systems Magazine:
- In the first article called ‘Reinvent Your Business With Business Process Mining’ he writes about the opportunities of understanding software systems from a business perspective.
- In the second, more technically oriented article ‘Process Mining: A Living View of Systems’ he shows in detail which Journaling Parameters need to be set and even provides open-source example code that you can use to extract the data from your IBM system.
Steve is an expert on AS/400 IBM systems and runs vLegaci, a company specializing in software management. I recommend to head over to the IBM Systems Magazine website, where you can read both articles online.
I also asked Steve to answer three questions here on this blog. You can read the interview below in this post.
Anne: Steve, why is the so-called greenfield development, where you make a fresh start, often not possible and people have to put up with all these old systems that nobody understands anymore?
Steve: Replacing legacy systems is costly, risky and disruptive to organizations. In typical legacy languages such as COBOL, applications may consist of a few million lines of code. A common estimate is that for every million lines of code in business applications there are about 30,000 business rules. How costly, risky and disruptive is it to redevelop tens of thousands of business rules? Whatever intelligence you can recover from your existing code is extremely valuable for either feeding the development of new systems, or identifying required functionality for purchasing off-the-shelf packages.
Anne: What does process mining add compared to traditional approaches such as static code analysis techniques?
Steve: Anyone who has been a programmer working with existing code knows that is impossible to look at a large program, let alone an entire system, and grasp everything that could happen within it. A subsystem with hundreds of conditional statements contains many millions of possible paths through the code. No one can fully comprehend all those possibilities. By creating or obtaining event logs of executing programs, possibly through program instrumentation if necessary, it is possible to observe the paths that are actually used, along with their frequency. By examining individual cases it is possible to then correlate data inputs with resulting path variances.
Best practice is surely to combine both static analysis and dynamic analysis, via process mining and other techniques. This provides deeper and more dimensional insight into system behavior.
Anne: How difficult is it to extract the data from a legacy system, how long does it take?
Steve: A simple starting point for most systems is to use database transaction logs. Most logs have some sort of session ID that can be the basis for cases. A step further is to extract key data identifiers from the transaction log, for example order number, customer number, etc., and use these as case IDs. This then expands the view of activities across sessions enabling you to understand how orders, customers, etc. behave. A further step is to engage in program instrumentation where you explicitly insert logging functions into the code in order to capture how programs are executing internally. I have used this recently for a client engaged in a modernization project where we are logging every call to every subroutine and screen input. This gives us an excellent view into a huge monolithic piece of legacy code.
Anne: Thank you, Steve!