Most of modern chat bot platforms consist of 3 main things — intent recognition , slot filling and dialog graph.
1.1 Intent recognition
Intent recognition is a text classification task which goal is to capture specific intent behind a user query. This is motivated by the fact that users tend to formulate their request in a lot of different ways so we need to have a system that is able to tell if those messages relate to the same thing or not. Let’s illustrate this with an example of a bank chat bot, where users can ask the bot to withdraw money:
You can see that although the requests are formulated in a lot of different ways and in different styles they all mean basically the same thing and chat bot should react in the same way. Therefore we need text classification model that captures the semantics behind user sentences and assigns them to the specific predefined class.
1.2 Slot filling
Once we know what action the user wants to take we need to capture specific parameters of those actions. For example, if you want Alexa to play your favorite song, you want her to play this specific song not just any song, so besides detecting intent chat bots also need to perform a task that is called slot-filling.
1.3 Dialog graph
Another requirement for chat bot functionality is dialog graph. It’s goal is to steer conversation in the right direction. For example when you say “Check the weather” the chat bot could then ask “What day should I check the weather for?” and next it will be looking for intents like ‘tomorrow’ or ‘today’. The important part here is, there would be no point in asking the second question without the first one, so there is a need for a system that stores the information of the point in the conversation where we are and what are the possible next states.
1.4 Our chat bot
In this tutorial our goal is to create a simple chat bot, so we are going to focus only on intent detection task and simple dialog graph model. This is enough to make a chat bot that is able to answer FAQ and conduct as simple conversation.
Our goal in designing an intent detection is to create a system that, given a few examples for intent, can detect that a sentence given by the user is similar to these examples and therefore should have the same intent.
The problem behind this system is that we have to design a system for checking if 2 sentences are similar. This could be achieved by eg. counting how many overlapping words are in the new sentence and the sentences in training data set. This is however a naive approach because a user can use a word that has similar meaning, but is different from the ones in the train examples.
2.1 Word embedding
A solution here is to use word embedding.
Word embeddings are mathematical representations of words encoded as vectors in n-dimentional space. Similar (used in the same context) words are close to each other in this space. This means that we can compare 2 or more words to each other not by e.g. the number of overlapping characters but by how close they are to each other in they embedded form.
2.2 Sentence embeddings
From word embeddings we can construct embeddings for the whole sentence. This can be done in a variety of ways, we can simply take the average of the word vectors, use weighted average to check how important the words are by e.g tf-idf coefficient or even use more advanced methods like transformer neural networks.
2.3 Similarity
Once we have prepared embeddings for the sentences we have to design a way for comparing them. A simple widely used method here is cosine similarity that measures similarity between two vectors as the angle between them.
To create the sentence embedding we are going to use flair library. This library is based not only on static word embeddings but also analyses the words character by character which helps in dealing with out-of-vocabulary words.
In our model we are going to embed the examples for each intent and then, while processing the users message, find the most similar one. This approach is mainly taken as fast and simple one, illustrating how embedding work. Most of modern systems use neural networks (link to related articles can be found at the end), however this approach can still be used if you want to design a system that is fast and and doesn’t use a lot of resources.
We begin our program with creating the outline of the model.
Description
1–9 : importing necessary libraries
11 : initialization of the flair model for creating embeddings of sentences. We are using English word embeddings and mean polling method for creating sentence embeddings from word embeddings.
13–20 : chatbot class, this class has two static methods one for creating embeddings and one for processing user message and answering it.
3.1 Preparing embeddings
Firstly we need to prepare a file containing our intents and their examples. This is a json dictionary that uses intents as keys and tables of examples as values.
Next we need to to create a function that constructs embeddings for the examples.
Description:
4 : Creating new python dictionary for the embeddings
5–6 : Opening the input file and loading it to python dictionary
7–8 : For each intent we create a table in the embeddings dictionary
9–12 For each example in the intent, we create a Flair sentence object that we can later embed using the model specified earlier. Finally we add the embedded sentence to the table
13–14: If the file doesn’t exist, we create it
15: We save the embedded dict. We use pickle instead of json to store the numpy arrays
3.2 Answering the message
Description
3 : We use the embeddings model
4 -5 : We load load the embeddings file created earlier
6–8 : Embedding of user message
9–10 : Initializing best intent and best sore variables
11–16 For each intent we loop through it’s embedded examples and check the cosine similarity between users message and those examples.
We chose the intent, which example has the highest similarity with the new message
17–18 : Loading the answers dict
19 : Checking if intent chosen by the system is in the answers dict
20 : Return random answer from the ones assigned to the chosen intent
In this format the chat bot has to choose one of the intents provided. This means we have no way of detecting if user said something that doesn’t belong to any of the intents. A possible solution is to check the numerical values of the cosine similarity and based on those observation assign a threshold value below which the bot will classify the message as the one it doesn’t know how to answer.