Echo Dot (3rd Gen) - Smart speaker with Alexa - Charcoal

Use your voice to play a song, artist, or genre through Amazon Music, Apple Music, Spotify, Pandora, and others. With compatible Echo devices in different rooms, you can fill your whole home with music.

Buy Now

Wireless Rechargeable Battery Powered WiFi Camera.

Wireless Rechargeable Battery Powered WiFi Camera is home security camera system lets you listen in and talk back through the built in speaker and microphone that work directly through your iPhone or Android Mic.

Buy Now

How To Use Deep RL For Chatbot Evolution And Make It More Human?


Manoj Rupareliya

Chatbots are the new age sensation, from the banking industry to personal voice assistants and from smartphones to smart homes, we see the rise of these bots for consumer service and even retails today. But, what makes them intelligent is the process of self-learning. This attribute is the one that needs more exploration.

There are many ways in which these chatbots are made to learn the human likeness and interact better with them. One such revolutionary method of learning is Deep Reinforcement Learning or the Deep RL. Recent evolution in the research and application of Deep RL for chatbots have improved the chatbot’s learning capabilities.

There are three basic types of machine learning subtypes used for chatbot learning and they are

  1. Supervised Learning
  2. Unsupervised Learning
  3. Reinforcement Learning

Chatbots can be trained using a training data set to create a model. Whenever a new input data is introduced to the ML algorithm used by the chatbots, it uses predictive analyses of the data set to make a prediction on the basis of the model. The prediction is evaluated for accuracy and several other factors. The accuracy of the model is thoroughly evaluated and if found acceptable the algorithm is deployed. This process of learning is continued repeatedly to make the chatbots better.

Image Source: How machine learning works?

Supervised learning is about making the algorithm learn through pre-recorded data sets for particular scenarios, in simple words a mentor data set that guides and makes the algorithm learn the probabilities and results by application of data set ion different situations.

For example, you have asked an algorithm to learn about the balls on a snooker table, each one with a number and color and then ask to identify a particular numbered ball, as the machine already has learned about all the balls, it will be easy for machine to identify any ball from the pool of balls.

There Are Two Categories of Supervised Learning Algorithms:

  1. Classification:
    A classification problem is considered when the output variable is a category like a type of color.
  2. Regression:
    A regression problem is when the output variable is a numerical value like numbers.

Unsupervised learning is the training of machine using data set which are not classified or labeled. The machine is allowed to identify and analyze the data set without any guidance. Here the task of machine is to group random data sets according to similarities, patterns, and differences without any prior training of data.

For example, the machine algorithm is provided with a set of different fruits into a basket which it has not seen anytime before and asked to identify a particular fruit by analyzing each fruit and categorize them on the basis of size, shape, and color to provide required results without any mentor data sets.

Unsupervised Learning Algorithms can be classified into two major types:

  1. Clustering:
    In a clustering problem, the machine uses inherent characteristics data to group the objects.
  2. Association:
    Association learning is finding effective relationships between data sets to ascertain a specific linkage and identify patterns on that basis.

Reinforcement Learning (RL) methods are typically based on value functions or policy search, which is applied to the Deep RL paradigm through neural networks. While value functions are specifically applied to task-oriented dialogue systems, policy search is specifically applied to open-ended dialogue systems such as chatbots.

The reason for policy search usage in chatbots is that the chatbots interactions have many infinite action sets and the value functions action can work only for finite action sets. Thus, we can see a high preference over the policy search method for chatbots Deep RL. But, policy search methods too face some issues of local and global optima, inefficiency and high variance.

Though, the use of value function-based methods for chatbots has not been fully explored up till now, which is to be explored for the perspective of deriving the action sets automatically. Other methods to deep RL include seq2seq models for dialogue generation.

Seq2Seq Model:

Seq2Seq model is considered to be revolutionary in terms of translation of words with its grammar and sentence intact, which eas absent in its predecessor. It takes a sequence of words as input and generates an output sequence of the same words. For, producing the sequence of words it uses Recurrent Neural Networks(RNN)

Image Source: Seq2Seq model

It possesses an encoder that can use a neural network to convert the sequence of words into corresponding hidden vectors. While a decoder uses the hidden state of the hidden vector and current word to predict the next word.

These methods are considered to be data-hungry methods with training millions of sentences and ultimately leading to high computational demands. The evaluation of these models is a difficult part and that there is a need for better evaluation metrics. Already present metrics such as Bleu and Meteor amongst others do not correlate with human judgments.

Considering the performance metrics in perspective, the reward functions used by deep RL dialogue agents are either specified manually depending on the application or learned from dialogue data. These reward functions used are not clearly simplified to be used for Deep RL of chatbots.

Chatbots have been integrated with the smartphones through machine learning capabilities through specifically designed APIs by mobile app development companies to utilize the heuristic approach of machine learning for chatbot learning and function.

Image Source: Deep Reinforcement Learning for Chatbots Using Clustered Actions and Human-Likeness Rewards

A clustered action is a group of sentences having a similar or related meaning via sentence vectors that they share in common, which is derived from word embeddings. As the interactions between the chatbots and humans can be infinite the actions referred to the sentences can be infinite.

The open-ended conversations between humans and chatbots use large sets of vocabularies than the goal or task-oriented conversations. There are multiple clustering algorithms that can be used, the ones useful for chatbots are:

  1. Unlabelled data due to human-human dialogues in raw text.
  2. Scalability to clustering a large set of data points.

In this model, each data point corresponds with a sentence with the dialogue and that is why the sentence is represented through a mean of word vectors.

Image Source: Deep Reinforcement Learning for Chatbots Using Clustered Actions and Human-Likeness Rewards

Thus, a trained clustering model can be achieved that can assign a cluster ID to features xi, where the number of actions is equivalent to the number of clusters.

Image Source: Deep Reinforcement Learning for Chatbots Using Clustered Actions and Human-Likeness Rewards

Using the Reward functions is considered to be pretty hard for Deep RL methods. Hence, the rewards derived from human-human dialogues by assigning positive values to contextualized responses seen in the data sets, and negative values to randomly generated responses suggesting the non-human dialogues due to lacking coherence are used for making the Depp RL possible for chatbots.

For example, an episode or dialogue reward function can be derived as :

Image Source: Deep Reinforcement Learning for Chatbots Using Clustered Actions and Human-Likeness Rewards

Deep Reinforcement Learning (DRL) agents maximize their cumulative reward overtime according to:

Image Source: Deep Reinforcement Learning for Chatbots Using Clustered Actions and Human-Likeness Rewards

During this learning, a DRL agent will choose an action in a probabilistic manner in order to explore new stat and action pairs for the discovery of new rewards or exploit already learned values, with reduced exploration and more exploitation over time.

  1. Using the Deep RL clustered actions method, vital improvements in the Chatbot agents can be achieved over time with more error-free and relatively natural conversations.
  2. Training chatbots on different domains like groups of dialogues or sequences of words are useful for improved performance.
  3. Data sets and testing of reward prediction models can be used to measure the authenticity of reward functions.
  4. Through this method, it can be observed that short dialogue histories lead us to obtain weak correlations, while longer dialogue histories contribute to obtaining more strong correlations.
  5. The task of these DRL agents is to learn and choose human-like actions or sentences from all the user responses including human-generated and randomly generated sentences.
  1. Better conversations between humans and chatbots.
  2. A better understanding of human behavior by chatbots.
  3. Self-learning gets more power through Deep RL for chatbots.
  4. Automatic detection of relevant words and sentences in co-ordination with user preferences and buyer patterns for brands and retailers.
  5. Rectification of overall conversational errors through several dialogue training and making the human-chatbot dialogue more accurate.
  6. Due to low errors the trust and faith of users increase on the Deep RL trained chatbots for conversation and other important transactions.
  7. Support towards the infinite conversation using higher vocabularies of words and sentences.
  8. More efficient predictive and probabilistic analyses of the data sets by chatbots for personalized financial and other products.

The advancement of computational capabilities in machine learning paradigma and technological innovations can further explore feature extraction, clustering algorithms, distance metrics, policy learning algorithms, architectures, and a comparison of reward functions for Deep RL training of chatbots.

Recent investments by enterprises in Artificial Intelligence and its technologies have proved that usage of AI for intelligent conversational virtual machines like chatbots will change the way business is conducted and users interact with any brand or enterprise for services and purchases. We have already seen the rise of smart speakers and virtual assistants in the market and improving chatbots through Deep Rl can just be the gamechanger this market needs.

Read More


Please enter your comment!
Please enter your name here