I discovered the chatbot framework Rasa a few weeks ago and was super impressed by the combination of pre-built tooling and flexibility.
Rasa is made up of two main components — (1) a natural language understanding (NLU) component so the chatbot can determine the intent of what you say to it and identify entities in what you say (NER) and (2) a core component that determines how the chatbot responds. Rasa is open source under an Apache 2.0 license.
Rasa X can be used with Rasa but is not necessary. Rasa X provides a UI to create training data and annotate conversations. You can even expose a link to testers and then your testers can create conversations that you can later evaluate. A good video demo of Rasa X is here. Rasa X has a specially drafted license (basically, you can use Rasa X for non-competing purposes but if you bring a patent claim against Rasa Technologies, you lose all rights…seek guidance from your own counsel, blah, blah, blah).
- Rasa has off-the-shelf integrations with Slack, FB messenger, Telegram, and Twilio.
- For debugging and training, there is an interactive learning mode where you interact with the bot and give feedback on responses.
- There is a CLI you can use create new projects, split your data into train and test samples, actually test your models and even visualize your dialogue flow.
I am always wary of any framework because I know I will spend precious time fighting with it to do something it does not (yet) have the capability to do. The NLU component is really a pipeline of models so you can supposedly mix/match and create your own. But really, how easy is it do so?
Since I ❤️ fastai , the first thing I wanted to try was to switch out the Rasa pre-made classifier in the NLU pipeline with my own fastai text classifier. Join me on my journey …
fastai text uses transfer learning to fine-tune a pre-trained language model. This means you need less data, but you still need some data.
To get going, I found a dataset of over 3,000 general chatbot conversations here. This was prepared for a hackathon and should be helpful in training my model on the general sort of language humans use when chatting with a bot. I also found a dataset of classified questions people asked an Ubuntu help bot here. The ask-ubuntu dataset was prepared to do a performance study on different NLU models including the pre-built Rasa one. The ask-ubuntu set was labeled with 5 different possible intents — (1) Make Update, (2) Setup Printer, (3) Shutdown Computer, (4) Software Recommendation, and (5) None. This set was also labeled for named entity recognition (NER). For instance, the position of the value of “12.04” in the text would be labeled as “UbuntuVersion”.
First, I converted the ask-ubuntu dataset into the Rasa training data format using an ugly script I will spare you from having to see 😝. You’re welcome.
From here on out I worked in a Jupyter notebook you can check out here if you want the nitty-gritty. ⚠️
I used Rasa’s CLI to handle splitting the data into a training set and a test set.
# will split 80% training/20% test by default
! rasa data split nlu
I created a pandas dataframe from the ask-ubuntu data with just the columns label (where the label was the intent classifier) and text. I also created a pandas dataframe from the general chat conversations filtering to only the rows from humans with the columns label (where the label was always human) and text. I then combined the two dataframes so I would have one dataframe to use for training my language mode. Note that when training the language model the label column is ignored so I just needed a label on the general chat dataframe for format.
I created my language model data bunch:
# Language model data
data_lm = TextLMDataBunch.from_csv(path, 'combined_chats_and_ubuntu_intents_for_lm_training.csv', text_cols='text')
I created my classifier model data bunch:
# Classififer model data using just the intents file
data_clas = TextClasDataBunch.from_csv(path, 'ubuntu_intents_for_clas_training.csv',
I saved the data bunches so I wouldn’t have to recreate again:
I created a learner for my language model and trained it:
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.5)
I unfreezed my model layers and trained some more:
I saved both the model and the encoder:
At this point, my language model could perform magical feats like predicting the next 20 words that would follow a given input:
learn.predict("where can", n_words=20)
Result: ‘where can you not find weight ? xxbos What do you do ? xxbos what kind of hobbies do you have’
On to the classifier!
I set up the classification learner using the encoder from the language model:
clas_learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5)
I trained the classifier for a number of epochs, unfroze layers and trained some more.
The fruits of my labor:
clas_learn.predict("how to i turn off?")
(Category Shutdown Computer, tensor(3), tensor([0.2320, 0.2100, 0.1573, 0.2532, 0.1475]))
clas_learn.predict("my printer won't start")
(Category Shutdown Computer, tensor(3), tensor([0.2139, 0.1912, 0.1828, 0.2253, 0.1868]))
Ok, the classification dataset only had 127 labeled examples and they were unbalanced, but we’re here to build a fastai-ified rasa chatbot, not start an Ubuntu help desk. Onwards!
Now, we’re getting serious.
Create a config that references our soon to be defined FastaiClassifier instead of the SklearnIntentClassifier that is usually in the Rasa pretrained_embeddings_spacy pipeline for NLU. You can do this by creating a yml file directly within your Rasa project or in a notebook like so:
fastai_config = """language: "en" pipeline:
- name: "SpacyNLP"
- name: "SpacyTokenizer"
- name: "SpacyFeaturizer"
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "FastaiClassifier" policies:
- name: MemoizationPolicy
- name: KerasPolicy
- name: MappingPolicy"""store fastai_config > ../fastai_config.yml
We kept the other steps in the pipeline since our FastAIClassifier is only doing intent classification, not entity extraction.
Let’s actually define the component.
Create a file
site-packages/rasa/nlu/classifiers/fastai_nlu.py where your Rasa is installed. The file should look like the one below. Note that we only have to define the
process method because our model is already trained and persisted. The
process method will be called at inference time so our bot can figure out what the chatter’s intent is.
Now that we have a new component, we have to make sure Rasa picks up our component. Otherwise, all of those nifty CLI tools to test our model are all for naught. The method that Rasa has in its component tutorial to accomplish this feat did not work for me and it seems others on the rasa forum.
What does work is hacking your
site-packages/rasa/nlu/registry.py file. Where this file is will depend on how you installed Rasa. Mine was at
Simply add an import to this file:
from rasa.nlu.classifiers.fastai_nlu import FastaiClassifier
Then, there is a
component_classes array at the top of the file. Add
FastaiClassifier as an element in this array.
Hack complete! 🚧✅
Even though our fastai classifier is already trained, we need to train the entity extractor on our training data and give Rasa a chance to wrap our models in a handy zip file 🎁.
To do this, we simply run:
! rasa train --force nlu
--nlu sets the path of the data file to use.
--config sets the config to use (which should be the one we just created that references our FastaiClassifier).
Once this runs, it will output:
NLU model training completed.
Your Rasa model is trained and saved at '/home/jupyter/rasa-fastai/notebooks/models/nlu-20190728-223406.tar.gz'
Now we can run the test command:
! rasa test nlu
--config sets our config again.
-u sets our TEST data path this time.
--model sets the path to the model zip file that the previous command provided as output.
--errors is the path you want Rasa to save a json file containing the errors in the model’s predictions.
--confmat is the path you want Rasa to save a confusion matrix of your predictions.
Our classifier’s accuracy is 94.3%.
It only got two wrong:
"text": "Security enhancements prevent mounting /dev/sdb1",
"name": "Software Recommendation",
"text": "How to provide user permission to read and write on /dev/sdax? What is the syntax?",
"name": "Software Recommendation",
And can you really blame it? Trying to train a “None” class is a bad idea. What is it trying to learn about the features of a “None” chat?
Confusion matrix just for fun and because Rasa did all the work to create it:
We can then easily compare our results against the default Rasa pipelines by creating new configs and running the
rasa train and
rasa test commands again for each.
On this dataset, the
pretrained_embeddings_spacy pipeline with the
SklearnIntentClassifier performed the same as our
FastaiClassifier. Both performed better than the
supervised_embeddings pipeline (91.4% accuracy) likely because the dataset is small. Rasa warns to only use
supervised_embeddings if you have a large training dataset.
You made it to the end. And, indeed, fastai + Rasa = ❤️
There is much further work that can be done here:
- Training and testing with a larger training and test set.
- Approaching unknown intent in a different manner than trying to train a “None” class (perhaps assigning “None” if confidence on any class is not high enough).
- Training the Core component to have a scintillating Ubuntu conversation with our bot.
Please let me know if you build anything using Rasa + fastai or if you have any questions or suggestions. Thank you 🙏 for reading! 📚📚📚