Process Mining : April 2016

Saturday, April 16, 2016

Data Requirements for Process Mining

One of the big advantages of Process Mining is that it starts with the data that is already there, and usually it starts very simple. There is no need to first set up a data collection framework. Instead you can use data that accumulate as abyproduct of the increasing automation and digitization of your business processes. These data are collected right now by the various IT systems you already have in place to support your business.

If you are interested in Process Mining but are still new to this area, you probably have the following question:

What kind of data do I need to do process mining?

Or, if you have heard about process mining through academia, you might ask:

What exactly is an event log?

This posts aims to answer both questions.

The core idea of process mining is to analyze data from a process perspective. You want to answer questions such as “How does my As-is process currently look like?”, “Are there waste and unnecessary steps that could be eliminated?”, “Where are the bottlenecks?”, and “Are there deviations from the rules and prescribed processes?”.

To be able to do that, Process Mining approaches data with a mental model that maps the data to a process view.

Classification in data mining

To understand what this means, let us first take a look at another mental model: The mental model for classification in data mining.

Assume that you have a widget factory and you want to understand which kinds of customers are buying your widgets. On the left side below, you see a very simple example of a data set. There are columns for the attributes Name, Salary, Sex, Age, and Buy widget. Each row forms one instance in the data set that can be used for learning the classification rules.

Before the classification algorithm can be started, one needs to determine which of the columns is the target class. Because we want to find out who is buying the widgets, we would make the Buy widget column the classification target. A data mining tool such as Weka would then be able to construct a decision tree like depicted on the right.

The result shows that only males with a high salary are buying the widgets. If we would want to derive rules for another attribute, for example, predict how old the customers will typically be that buy our widgets, then the Age column would be the classification target.

The mental model for process mining

For process mining, we have a slightly different meta model in mind because we look at the data from a process perspective.

Below, you see a simplified example data set from an internal call center case study. In contrast to the data mining example, an individual row does not represent a complete process instance, but just an event. That’s where the term event logcomes from.

Each event corresponds to an activity that was executed in the process.
Multiple events are linked together in a process instance or case.
Logically, each case forms a sequence of events—ordered by theirtimestamp.

From the data sample below, you can see why even doing simple process-related analyses, such as measuring the frequency of process flow variants, or the time between activities, is impossible using standard tools such Excel. Process instances are scattered over multiple rows in a spreadsheet (not necessarily sorted!) and can only be linked by adopting a process-oriented meta model.

If you look at the highlighted rows 6–9, you can see one process instance(case9705) that starts with the status Registered on 20 October 2009, moves on to At specialist and In progress, and ends with status Completed on 19 November 2009.

The three requirements

The basis of process mining is to look at historical process data precisely with such a “process lens”. It’s actually quite simple. Regardless of where your data come from (database, log files, Excel sheet, data warehouse, etc.), the three minimal requirements are the following:

Case ID: A case identifier, also called process instance ID¹, is necessary to distinguish different executions of the same process. What precisely the case ID is depends on the domain of the process.

For example, in a call center, the case ID would be a service request number. In a hospital, this would be the patient ID.
Activity: There should be names for different process steps or status changes that were performed in the process. If you have only one entry (one row) for each process instance, then your data is not detailed enough.

Your data needs to be on the transactional level (you should have access to the history of each case) and should not be aggregated to the case level.
Timestamp: At least one timestamp is needed to bring the events in the right order. Of course you also need timestamps to identify delays between activities and identify bottlenecks in your process.

If you have a start and complete timestamp for each activity in the process, then a distinction between active and idle times in the process becomes possible.

Additional columns can be included for the analysis if available. For example, in the data sample there are further attributes that categorize the service request: A case was opened by phone, resolved by an external specialist, and the urgency was categorized as level 2. We might also include the resource or department that performed an activity. But the mandatory columns are just the three requirements above.

Summary

To summarize, all you need are data that can be linked to a case ID, activities, andtimestamps. It does not matter where these data come from (ERP, CRM, workflow logs, ticketing system, PDM, HIS records, legacy log files, and so on), and you don’t need a BPM system with pre-modelled process models to get started with process mining.

It is one of the big advantages that process mining does not depend on specific automation technology or specific systems. It is a source system-agnostic technology, precisely because it is centered around the process-oriented mental model explained above.

What You Can Do With Process Mining

Source: https://fluxicon.com/blog/2015/10/why-process-mining-is-ideal-for-data-scientists/

Process Mining is not a reporting tool, but an analysis tool. It enables you to quickly analyse any and very complex processes. For example so-called Click Streams from websites that show how visitors navigate a webpage (and where they “drop out” or “wander around” due to poor usability of the page). Or take the new workflow system in your company, which has only recently been established and from which the department now wants to know how many processes really follow the redesigned, streamlined process path.

You can display the activity flow as well as the transfer between departments in different views of the process, identify bottlenecks, and investigate unwanted or long-running paths within the process.

These process views can also be animated to help in the communication with the department: the actual processes based on the timestamps from the data are ‘replayed’ and show in a very tangible way where the problems in the process are.

Why Data Scientists Should Become Familiar with Process Mining

Data science teams around the world begin to start looking into Process Mining because:

Process Mining fills a gap which is not covered by existing data-mining, statistics and visualization tools. For example, data mining techniques can extract decision trees, predictions, or Frequent Patterns, but cannot display complete processes.
Data scientists with their skills to extract, link, and prepare data are ideally equipped to exploit the full potential of Process Mining. For example, the data of different IT systems such as the CRM data calls in the call center of a bank and the interactions with the customer advisor in the branch must be linked with each other in a ‘Customer Journey’ analysis.
Analytical results must be communicated with the business. Data Science Teams do not analyse data for themselves, but to solve problems and issues for the business. If these questions revolve around processes, then charts and statistics are only meaningful in a limited way and are often too abstract. Process Mining allows you to provide a visual representation to the process owner, and also to directly profit from their domain knowledge in interactive analysis workshops. This allows you to find and implement solutions quickly.

Next Steps

Are you curious and want to know more about Process Mining? We recommend the following links:

15 minute recording, Presentation on Process Mining at bpmNEXT (with introduction and live demo): youtu.be/ql1S1wAxJ0E?t=10s
Introductory article on Process Mining: fluxicon.com/s/pmarticle

2 free online courses (so-called MOOCs) have recently started, which offer an introduction to the topic of Process Mining:

The ‘Process mining: Data science in Action’ MOOC at Coursera is a course given by Prof. Wil van der Aalst himself and provides a comprehensive picture of the foundations and the background of Process Mining algorithms: www.coursera.org/course/procmin
The ‘Fundamentals of BPM’ MOOC of the Queensland University of Technology has generally a business process management focus but also includes a practical segment about Process Mining:moocs.qut.edu.au/learn/fundamentals-of-bpm-october-2015

To really get a good picture of what Process Mining can do (and what it can‘t do), it is best to try it out yourself. Here are two easily accessible ways to get started:

The academic Process Mining platform ‘ProM’ is Open Source and contains hundreds of plug-ins the with the latest Process Mining algorithms: promtools.org
For an easy introduction and for the professional Power User you can download the demo version of our Process Mining software ‘Disco’ from the following webpage: fluxicon.com/disco/

Why Web Usage Mining?

In this paper, we will emphasize on Web usage mining. Reasons are very simple: With the explosion of E-commerce, the way companies are doing businesses has been changed. E-commerce, mainly characterized by electronic transactions through Internet, has provided us a cost-efficient and effective way of doing business. The growth of some E-businesses is astonishing, considering how E-commerce has made Amazon.com become the so-called “on-line Wal-Mart”. Unfortunately, to most companies, web is nothing more than a place where transactions take place. They did not realize that as millions of visitors interact daily with Web sites around the world, massive amounts of data are being generated. And they also did not realize that this information could be very precious to the company in the fields of understanding customer behavior, improving customer services and relationship, launching target marketing campaigns, measuring the success of marketing efforts, and so on.

Web usage mining has emerged as the essential tool for realizing more personalized, user-friendly and business-optimal Web services. Advances in data pre-processing, modeling, and mining techniques, applied to the Web data, have already resulted in many successful applications in adaptive information systems, personalization services, Web analytics tools, and content management systems. As the complexity of Web applications and user’s interaction with these applications increases, the need for intelligent analysis of the Web usage data will also continue to grow. Usage patterns discovered through Web usage mining are effective in capturing item-to-item and user-to-user relationships and similarities at the level of user sessions. However, without the benefit of deeper domain knowledge, such patterns provide little insight into the underlying reasons for which such items or users are grouped together. Furthermore, the inherent and increasing heterogeneity of the Web has required Web-based applications to more effectively integrate a variety of types of data across multiple channels and from different sources. Thus, a focus on techniques and architectures for more effective integration and mining of content, usage, and structure data from different sources is likely to lead to the next generation of more useful and more intelligent applications, and more sophisticated tools for Web usage mining that can derive intelligence from user transactions on the Web.

What is Web Mining?

Web mining can be broadly defined as discovery and analysis of useful information from the World Wide Web. Based on the different emphasis and different ways to obtain information, web mining can be divided into two major parts: Web Contents Mining and Web Usage Mining. Web Contents Mining can be described as the automatic search and retrieval of information and resources available from millions of sites and on-line databases though search engines / web spiders. Web Usage Mining can be described as the discovery and analysis of user access patterns, through the mining of log files and associated data from a particular Web site.

What are the applications of Web Mining?

Web mining extends analysis much further by combining other corporate information with Web traffic data. This allows accounting, customer profile, inventory, and demographic information to be correlated with Web browsing, which answers complex questions such as:

· Of the people who hit our Web site, how many purchased something?

· Which advertising campaigns resulted in the most purchases, not just hits?

· Do my Web visitors fit a certain profile? Can I use this for segmenting my market?

Practical applications of Web mining technology are abundant, and are by no means the limit to this technology. Web mining tools can be extended and programmed to answer almost any question.

Web mining can provide companies managerial insight into visitor profiles, which help top management take strategic actions accordingly. Also, the company can obtain some subjective measurements through Web Mining on the effectiveness of their marketing campaign or marketing research, which will help the business to improve and align their marketing strategies timely.

For example, the company may have a list of goals as following:

· Increase average page views per session;

· Increase average profit per checkout;

· Decrease products returned;

· Increase number of referred customers;

· Increase brand awareness;

· Increase retention rate (such as number of visitors that have returned within 30 days);

· Reduce clicks-to-close(average page views to accomplish a purchase or obtain desired information);

· Increase conversion rate (checkouts per visit).

The company can identify the strength and weakness of its web marketing campaign through Web Mining, and then make strategic adjustments, obtain the feedback from Web Mining again to see the improvement. This procedure is an on-going continuous process.

Next, we will give some examples on Web Mining applications.