I have worked on building internal employee-facing chatbots for more than two years. Our current chatbot platform only allows 10 training examples for each intent. To many people, 10 training examples sound like a lot, cause it is hard for a person to come up with 10 different ways to ask the same question. However, with hundreds of thousands of users, it is very common to have more than 10 verbiages for the same intent. Most chatbot tools nowadays have some degree of natural language understanding (NLU) or natural language processing (NLP) capacity, which means we don’t need to provide all the possible variations as training examples, we only need to provide the most popular and representative ones. However, will 10 be enough?
In this article, I will list the real user queries of a very common intent in the human resource domain, analyze where the variations come from, so that you will be able to estimate the number and type of training examples you need to train an intent. In the end, I also listed some tips on evaluating chatbot tools.
The sample intent is to get my employer’s holiday schedule for the current year. Let’s assume my employer’s name is Microsoft.
The sample response to the above intent is
- New Year’s Day — January 1
- Martin Luther King Day — January 20
- President’s Day — February 17
- Memorial Day — May 25
- Independence Day — July 3
- Labor Day — September 7
- Columbus Day — October 12
- Veterans’ Day — November 11
- Thanksgiving Day — November 26
- Day After Thanksgiving — November 27
- Christmas — December 25
I want to pause right here and ask you to think about:
- how will you express this intent to a chatbot?
- how many different ways can you think of to express this intent to a chatbot?
I fount 31 different user queries that could be a match of the sample intent.
- What is the company’s holiday schedule for 2020?
- What are the 2020 company holidays?
- what are the extra holidays in 2020?
- Where can I find the holiday calendar?
- what is this year holiday schedule?
- what are the Microsoft holidays?
- What holiday do I get off?
- what are our 2020 holidays
- list of Microsoft holidays
- list of holidays
- 2020 holiday calendar
- 2020 paid holidays
- 2020 holiday schedule
- 2020 vacation days
- 2020 holidays
- 2020 vacation
- company holiday
- corporate holidays
- microsoft holiday 2020
- microsoft holiday
- ms holiday
- ms holidays
- is Christmas Eve 2020 a holiday?
- is Columbus day a Microsoft holiday?
- is July 5 a holiday?
- Which day do we get off for 4th of July?
- What’s the next holiday?
- When is my next holiday?
- Where is the vacation calendar?
The above variations mainly come from different word/phrase choices, different syntactic structures, and the need for all or partial responses.
- Common synonyms — holiday / holidays / vacation / vacations, schedule / calendar, company / corporate
- Synonyms only in specific context — MS / Microsoft / company / our, this year / 2020, holiday / get off
- Acronyms — MS / Microsoft
- Incorrect spacing — Microsoft / Micro Soft
- Typos — vacation / vocation / vaacation
Most intent identifiers nowadays can handle some typos and the most common synonyms. However, for special acronyms like “MS” for “Microsoft”, and for words with the same meaning only in a specific context like “our” and “company”, an intent identifier may not be able to recognize the similarity of those words without explicit training examples.
- Different syntactic structure of full sentences — What is the company’s holiday schedule? / Where can I find the holiday calendar?, What is the next holiday? / When is the next holiday?
- Full-sentences / keyphrases — What is the company’s holiday schedule? / list of holidays
- Informal language and incorrect grammar
One could argue that “What is the schedule?” is asking for the schedule while “Where is the schedule?” is asking for the location of the schedule, therefore they are two separate intents. However, in this case, most users who ask for the location want to see the schedule. “Where is the schedule?” is a short version of “Where is the schedule? I want to see it”. Also, it saves users more time when responding with the schedule directly than showing where to get it. For example, If the response provides the link to the schedule, it costs a user one more click then putting the schedule in the response. Therefore I prefer to combine those two intents.
Another issue is some users prefer keywords then full sentences, which make the intent ambiguous. For example, if a user asks “holiday”, does he want to get the employer’s holiday schedule, the federal holidays, holiday event, holiday gift service, or anything else related to holiday? The intent recognizer does not have enough information to rank all the possible intents and might not return the most popular intents for these keywords. Sometimes we put keywords and phrases as training examples in the most popular intents to make sure the popular intents rank higher for those keywords.
Informal language (e.g. acronyms and emojis) and incorrect grammar are also very common in user queries. Assume the internal employees call the company holiday schedule “CHS”, the intent identifier will not understand it without a training example says “What is CHS?” is the same with “What is company holiday schedule”.
Some users don’t want to see the complete holiday schedule. They only need to know the date of a specific holiday. In some situations, it might worth creating one intent for each holiday. For example, “Which day do we get off for 4th of July?”, “What is the holiday schedule for Christmas?” However, after creating 11 more intents, one for each holiday, there are also intents like “What is the next holiday?”, “Is July 5th a holiday?”. It is probably not worth adding another 365 intents, one for each day in a year. An intent is usually a group of many sub-intents. Although I want my chatbot to understand all the smallest intents and answer properly with only the required information, not more and not less, it is not feasible to do so with a small team. In practice, we usually group all the intents with the same response together, so that if the response needs an update, we only need to update one intent. Therefore, to train an intent, we need to provide training examples of different sub-intents.
I don’t really know exactly how many training examples are needed for an intent. It depends on how intelligent your chatbot is, how broad your intents are, and also how high the users’ expectation is for the chatbot.
When choosing chatbot tools, the ability to handle typos and grammar mistakes is a must. It is hard to train a chatbot to recognize typos and grammar mistakes using training examples because there are too many typos and you cannot enumerate all of them. However, typos are so common in user queries and if your chatbot cannot handle them, it will lead to very bad user experience. When a chatbot responds “Sorry I don’t understand.” Users won’t blame themselves for their typos and grammar mistakes, they will just say the chatbot is not intelligent.
The ability to understand common synonyms is also very important. For example, if a training example has three words and each word has three common synonyms. Without the ability to understand common synonyms, you need to provide 3 * 3 * 3 = 27 training examples in total, just to cover the basic synonyms. Ideally, the chatbot tool should also allow you to define your own all-time synonyms and synonyms within specific context.
The ability to understand sentences with different syntactic structures is nice to have, but I wouldn’t rely too much on that. Questions that look very different but mean similar things usually need to be added as training examples.
The ability to understand sub-intent is rarely seen in the current chatbot tools, especially if the chatbot is trained only with questions. If a chatbot is trained on both questions and responses, and the responses have traces of the sub-intents, there is a chance that the chatbot can understand sub-intents without additional training examples.
The ability to handle ambiguous intents will improve the user experience as well. When the chatbot cannot determine which intent it is, why not provide multiple options for the user to choose from? It is not practical to ask users to always express their intent clearly with full sentences and perfect language.
Different intents require different numbers of training examples. It is fine to have a max number of training examples, but 10 is too small in my opinion. Dialogflow allows 250 training examples for each intent. It doesn’t mean all intent will have 250 training examples, but at least we are allowed to provide more when needed.
I have heard many times from different chatbot vendors that their tool only needs one training example for each intent. At first, I got really excited when hearing it. Now it is a huge red flag for me. I prefer vendors being honest about what their chatbot can handle and show us the ways to optimize than saying how easy it is to build a chatbot with their tool which turns out to be useless.
Thanks for reading! If you are interested in user intents and why it is so hard to understand them, please read another article of mine below:
Any feedback is welcomed!