As conversational language interfaces begin to dominate customer service, so does the backlash against chatbots grow. Forrester predicted last year that 2019 would be the year of the backlash against inefficient chatbots, and it looks like they were right. For example, a survey commissioned by an open software service company Acquia, that analyzed responses from more than 5,000 consumers and 500 marketers in North America, Europe and Australia, found that 45 percent of consumers find chatbots “annoying”.
At the same time, the importance of conversational AI for business today cannot be overestimated. When done right, conversational AI has the ability to significantly increase your competitive advantage and fundamentally change the nature of business-customer interaction.
According to Gartner, by 2021, more than 50% of enterprises will spend more per annum on bots and chatbot creation than traditional mobile app development. By 2022 chatbots are expected to cut business costs by $ 8 billion and by 2023, 2.5 billion hours will be saved by both businesses and consumers through the use of chatbots.
By 2024, the overall market size for chatbots worldwide is predicted to exceed $1.3 billion.
So, what are the reasons for the growing consumer dissatisfaction with chatbots and how do businesses ensure their digital assistants have a positive effect on their relationship with their customers?
Some reasons are straightforward and well known. For instance, many companies rely on chatbots to do too much while failing to provide a human “safety net” (that is, a quick and efficient way to handover to a live agent if the need arises). Some chatbots are too slow to respond (and some are unnaturally fast). In other cases, the chatbot’s personality simply does not align with the brand’s voice. Some businesses deploy a half-baked chatbot and end up training it on their customers, which often ends up being a fatal mistake.
But one huge and rarely discussed reason that customers more and more often feel that chatbots fail to deliver is simply the complexity of the task at hand. The more conversational AI permeates our interactions with technology, the more people expect these interactions to resemble natural human conversations, expecting intelligent responses to their naturally phrased questions and requests. But the problem is that as soon as you move away from simple command-response interactions you end up in the incredibly complex terrain of the human language. Something that we take totally for granted (talking), and that even a three-year-old human child is able to do, is incredibly difficult to teach machines to do.
Here are a few common features of the human language that we use without thinking every time we open our mouths to speak, but that can be impossibly difficult for many chatbots to master.
Sometimes we use words with their intended meaning, but in many cases, we use them as idioms. An idiom is a group of words whose meaning cannot be deduced from the meanings of the individual words, but instead must be memorized.
In the following blunder, Siri interprets the idiom “make a note” literally, deducing it from the meaning of individual words, while the person is actually using it as an expression meaning “remember something”:
While humans rely on context and other cues to determine whether the words are used as an idiom or not, many chatbots cannot easily do so. You can only imagine the awkward interactions that arise when other idioms are interpreted literally by chatbots. For example, “make your blood boil” to mean “make someone angry” (would a medical chatbot direct you to symptoms and causes?), “make up your mind”, “make your day”, “It’s raining cats and dogs”, “break a leg”, “up in the air”, and “a piece of cake”.
The meaning of nouns, verbs and other lexical items can be taught. But pronouns are tricky because they refer to something previously mentioned in the context. Each pronoun needs an antecedent, i.e. an entity or individual that it refers to that is mentioned earlier in the context. But in some cases there is more than one possible antecedent, and the job of the listener is to pick the right one based on context.
So let’s say you have just left home for work and are contacting your digital assistant with the following request:
Human: Switch off the light in the living room and turn off the coffee pot
Human: Turn it back on at 6pm today
“It” in the second sentence here is ambiguous, as it can refer to either the coffee maker or the living room lights. A well-designed chatbot should be able to notice the ambiguity and ask a clarifying question.
“Jennifer invited Amanda for a visit, and she gave her a good dinner.”
A human may figure out from the context that “she” here most likely refers to Jennifer while “her” to Amanda, just because we know that it’s more likely for the person who extends the invitation to provide dinner.
Now consider this:
“Jennifer invited Amanda for a visit, and she gave her a pretty necklace.”
Now who gave the necklace to whom? There are fewer social conventions associated with necklace giving than dinner serving so even humans would be unsure as to the intended meaning here.
“Jennifer invited Amanda for a visit, but she told her she was going out of town.”
It would be strange for Jennifer to invite Amanda for a visit and immediately let her know about her travel plans for that day. This is how we know that it is Amanda who is going out of town and therefore cannot accept the invitation.
Again, it’s only the context and the knowledge of social norms that lets us figure out the correct meaning, and your conversational AI needs to be advanced enough to be able to learn it.
Now, consider just how many possible antecedents the pronoun that can have in any given context.
“Anna told Brian that she had decided to spend a year in England to study creative writing.”
“ That would be her life’s work.” (that = creative writing)
“After she had done that, she would come back to live with him.” (that = spending a year in England)
“ That was what she was thinking about the whole week” (that = deciding)
“ That started a two-hour fight.” (that = telling Brian)”
Now, how would you teach the average chatbot something like that?
Some ambiguity is structural. For example, the sentence “The chicken is ready to eat” can mean either that the chicken is on the table, or that you need to feed your pet chicken. A chatbot needs to be taught a lot of context both about the world and about your personal situation (e.g. that you personally don’t have a pet chicken, or that chicken is a popular dish but is not a popular pet, or on the contrary that you’re a vegan and have a pet chicken) in order to interpret it correctly.
When the sentence contains a temporal adverbial and more than one verb, it is not always clear which verb the adverbial modifies.
For example, the sentence “Jamie said on Friday we will have a party” can mean two things:
- On Friday Jamie said that we will have a party
- Jamie said that on Friday we will have a party
Again, in a natural conversation, humans will either have enough context to determine which meaning is used or will know to ask clarifying questions, but most chatbots cannot discern structural ambiguity when it arises.
In spoken informal conversations we often omit information that we know the other person can easily fill in based on the context. For example, we can say “I’m allergic to dairy. Also fish” — and it will be understood that you are allergic to fish as well as dairy rather than that fish is allergic to dairy.
Sometimes ellipsis causes ambiguity. For example:
- Mike loves his mother and Bill does too. (Does Bill love Mike’s mom or his own?).
- Anna is in her room and Jane is too (is Jane in Anna’s room or in her own?).
In some cases, in speech we may be able to figure out the exact meaning of an ambiguous sentence with the help of prosodic cues (intonation, tone, pitch, etc.)
For example, the sentence They are cooking apples can mean two things:
- They [i.e. some people] are cooking apples.
- These apples are for cooking.
Humans rely both on context and intonation to interpret the sentence correctly, but many chatbots do not have access to these.
Not every question needs an answer. Some questions are rhetorical and serve to simply convey the emotional state of the speaker rather than seek an answer.
For example, the question “Do you know what time it is?” is generally appropriately answered with the statement of time (“It’s 11:30 am”) unless it is used in the context such as the following.
“You said we will be there by 11:00! Do you know what time it is?”
If your chatbot answers this question with “It’s 11:30 am” there is no reason it shouldn’t be labeled as “annoying”.
Furthermore, if a chatbot is not able to understand when a customer is being sarcastic, the interaction will not go well. For instance, imagine that in response to an unsatisfying answer from a chatbot the customer says “Just what I needed. What am I supposed to do with this?” If your chatbot does not detect the sarcastic tone but replies “you’re welcome” or attempts to answer the question, this chatbot will be perceived as extremely irritating.
Finally, humour is possibly the best example of how difficult it is to make chatbots interact like humans. In the following nugget of humour, the British comedian Jimmy Carr uses the ambiguity and the vagueness of the question asked to create a nonsensical humorous interaction:
“A lady with a clipboard stopped me in the street the other day. She said, ‘Can you spare a few minutes for cancer research?’ I said, ‘All right, but we’re not going to get much done.’” -English comedian Jimmy Carr
To cope with all these potential pitfalls, you need a truly advanced conversational AI system. One that is flexible and teachable enough to accommodate such subtleties of the human language, as simple command-and-control chatbots are simply not designed to handle these.
The problem with most standard chatbots today is that they sell a promise of “AI” with “machine learning” as the silver bullet. However, chatbots that are based solely on machine learning are essentially black-box systems that cannot work without vast amounts of curated training data. If they don’t immediately understand what you expect them to, they cannot be easily tuned or enriched by the developer. The only way to make them “change their mind” is to add more data.
This data needs to be not only available in large quantities, but also be accurate AND classified AND machine readable AND relevant. Rarely will a company have ALL of those boxes ticked for all necessary features of every language.
Fortunately, there is a solution. The Teneo platform from Artificial Solutions uses a hybrid approach that relies on a combination of machine learning and linguistic learning.
That is, it takes advantage of the data that does exist, without exclusively relying on it. Linguistic learning does not use statistical data in the same way as a machine learning system does. Instead, it is a rule-based system that allows for human oversight and fine-tuning of the rules and responses. It therefore allows much more control over how questions are understood, and responses are given. With linguistic learning, if you want the chatbot to interpret a specific sentence differently, you can simply tell it to by reprogramming the rules (whereas in machine-learning, the system needs to be convinced to interpret the sentence differently by being shown lots and lots of counter-examples.)
In short, Teneo’s linguistic abilities allows developers to teach the system directly the correct response and then enable it to refine its performance using machine learning. It is this flexible hybrid approach that enables enterprises to create reliable conversational AI solutions that help them build and improve a relationship with their customers, not sabotage it. What will you do to build a chatbot that’s not labelled as “annoying”?