Conversation Mining with LUIS

This is a guest article by Zvi Topol based on an article that has previously appeared in MSDN Magazine. If you have a guest article or process mining case study that you would like to share as well, please contact us via

The Language Understanding Intelligence Service (LUIS) is a Microsoft Cognitive Services API that offers a machine learning based natural language understanding as a service for developers. There are many use cases for LUIS, including natural language interfaces such as chatbots, voice interfaces and cognitive search engines.

When given a textual user input, also called an ‘utterance’, LUIS returns the intent detected behind the utterance. So, LUIS can help the developer to find out automatically what the user intends to ask about.

In this article, I will focus on how to get insights from conversational data. With ‘conversational data’ I mean data that is composed of sequences of utterances that collectively make a conversation.

I will show how to transform conversational data, which is innately unstructured, into a structured dataset by applying LUIS to each utterance in a conversation. Then, I will use process mining on the transformed, structured dataset to derive insights about the original conversations.

Let’s get started.

Getting Conversational Data Ready for Process Mining

To be able to represent conversations as processes, each case ID is a specific conversation and the intents of the different utterances in each conversation are the activities of the process.

Let’s take a look at an example of conversational data from the financial technology space (see one conversation in the screenshot below).1 In this example, users are having conversations with a chatbot about mortgages. To keep things simple, I have chosen to include only the user utterances, not the system responses. If you wanted, you could decide to include the system responses or any other data you think is related, such as information pertaining to the chat sessions, user data and so on.

Based on each utterance, LUIS can now identify what the user is asking about. It also detects the different entities—references to real-world objects—that appear in the utterance. Additionally, it outputs a confidence score for each intent and entity detected. Those are numbers in the range [0, 1], with 1 indicating the most confidence about the detection and 0 being the least confident about it.

Under the hood, LUIS utilizes machine learning models that are able to detect the intents and entities and can be trained on newly supplied examples. Such examples are specific to the application domain the developer focuses on. This allows developers to customize intent and entity detection to the utterances asked by the users.

The following is an example of the output by LUIS when trained on a few examples in a financial technology application domain where users can ask questions about their bank accounts or financial products such as mortgages:

"query": "what are annual rates for savings accounts",
"topScoringIntent": {
"intent": "OtherServicesIntent",
"score": 0.577525139
"intents": [
"intent": "OtherServicesIntent",
"score": 0.577525139
"intent": "PersonalAccountsIntent",
"score": 0.267547846
"intent": "None",
"score": 0.00754897855
"entities": []

As you can see, LUIS outputs the different intents it was trained on along with their confidence scores. Note that in this example, as well as the material included in this article, I will focus on intents and will not use entity detection.

The following intents are included in the data:

  • GreetingIntent: a greeting or conversation opener.
  • ExplorationIntent: a general exploratory utterance made by the user.
  • OperatorRequestIntent: a request by the user to speak with a human operator.
  • SpecificQuestionIntent: a question from the user about mortgage rates.
  • ContactInfoIntent: contact information provided by the user.
  • PositiveFeedbackIntent: positive feedback provided by the user.
  • NegativeFeedbackIntent: negative feedback provided by the user.
  • EndConversationIntent: ending of the conversation with the bot initiated by the user.

For the five events in the conversation in ConversationId 3 in the initial data sample above, the following intents are identified for each utterance:


In this way, the original conversational data is transformed into a sequence of intents. The result will be used to enrich the original data set by a fourth column called ‘Intent’.

When we import the enriched data set into Disco, the fields in the CSV dataset are configured as follows (see also the screenshot below):

  • ConversationId: Identifies the conversation in a unique way and is mapped to the case ID.
  • TimeStamp: The timestamp for a given Conversation ID/Utterance pair is configured as the timestamp for process mining.
  • Utterance: The user’s utterance (essentially unstructured text data) to which LUIS is applied to identify intents is included as an attribute.
  • Intent: The intent identified by LUIS is mapped as the activity name for process mining.

Applying Process Mining to Conversational Data Using Disco

After importing the CSV file into Disco based on the configuration shown above, you can see the discovered process map based on the conversational data (see screenshot below – click on the image to see a larger version).

The process map is a graphical representation of the different transitions in the process between the events, as well as frequencies and repetitions of different activities. In our data set, the transitions that are shown are the transitions between the intents.

From the discovered process map, you can get a general overview of the conversations and see that conversations can start in one of three different ways—a greeting, an operator request or a mortgage-specific question, with mortgage-specific questions being very frequent. Most conversations end with an EndConversationIntent, but a few end with other intents that represent greetings and negative feedback. In particular with regard to negative feedback, these can point to outlier conversations that may require more attention.

Moreover, transitions between different intents can also provide very useful information for deriving intents. For example, it may be possible to determine whether there are specific utterances or intents that lead to the intent representing negative feedback. It might then be desirable to drive conversations away from that path.

Information about repetitions of both intents and transitions is readily available as part of the discovered process map. In particular, you can see that the two most common intents in this case are SpecificQuestionIntent and EndConversation­Intent, and that transitions from the former to the latter are very common. This provides a good summary at a glance regarding the content of the conversations.

It can also present an opportunity to improve conversations by considering breaking down Specific­QuestionIntent and EndConversationIntent into finer grain intents that can capture more insightful aspects of the user interaction. This should be followed by retraining LUIS and repeating the application of process mining to the modified conversational data.

When we look at the overview statistics (see screenshot below), we can get insights about the duration of the conversations. This can be useful to identify outliers, such as extremely short conversations, and to cross check with conversations from the map view regarding potentially problematic conversations. It is also possible to identify conversations with longer durations. In the example I use here, those are likely to be successful conversations.

In order to dive deeper into conversations that exhibit interesting behaviors, for example, unusually long or short conversations, or conversations with certain intent structures, you can use Disco’s powerful filtering capabilities. At any given point, Disco allows you to filter the overall dataset by various dimensions. This allows you to identify patterns common to the filtered conversations.

We can also get some overview statistics at the intent level by using the Activity section of the Statistics view (see screenshot below). We can see that, fortunately, the negative feedback intent comprises only about 3 percent of the intents in our conversations.

Finally, we can also look at individual conversations based on their variants. With a ‘variant’ all the conversations that have the same conversation flow of intents are grouped and we can inspect the different variants to see whether they correspond to the expected scenarios.

For example, in the screenshot below you can see a specific conversation (ConversationId 9) that belongs to a variant with two intents: SpecificQuestionIntent and EndConversationIntent. By comparing conversations that have similar structures, you can learn if there are any patterns that you can adopt that would help make conversations more successful. If you happen to find unexpected differences, it can help you to discover what is causing them.


In this article, I have shown how process mining can be leveraged in conjunction with LUIS to derive insights from conversational data.

In particular, LUIS is applied to the different utterances in the conversations to transform unstructured utterance text to structured intent labels.

Then, through mapping of conversation ID, time stamps and intents to process-mining fields, I showed how to apply process mining to the structured conversational data in Disco. Through discovering the overall conversation process, it is possible to derive insights from the transformed conversational data. For example, we can learn what makes a conversation successful and use that knowledge to improve conversations that are less successful.

I encourage you to explore this area further on your own. For example, you could use many additional fields as part of your activity representation (e.g. information about specific entities in user utterances; the responses of your conversational interface; or data about your users, such as locations, previous interactions with the system, and so on). Such rich representations will enable you to enhance the depth of insights from your conversational data and, ultimately, create better, more compelling conversational interfaces.


Zvi Topol has been working as a data scientist in various industry verticals, including marketing analytics, media and entertainment, and Industrial Internet of Things. He has delivered and lead multiple machine learning and analytics projects including natural language and voice interfaces, cognitive search, video analysis, recommender systems and marketing decision support systems. Topol is currently with MuyVentive, an advanced analytics R&D company, and can be reached at

  1. You can download the CSV file containing 10 different simulated conversations to follow along here.