This is the article #3 in a series about the challenges of adding deep logic (expertise) to product support chatbots. I work for eXvisory.ai, an Oxford UK startup that provides visual dev tools to add deep logic to chatbots.
If you’ve ever developed a chatbot that automates a complex business process, like product support, you know it’s not easy. You smile wryly at marketing taglines that talk about building chatbots in 10 minutes (while fumbling for your caffeine tablets). A big part of the difficulty in non-trivial chatbot development is creating and maintaining the network of IF…THEN statements required to encode complex conversational flows. But why is it so hard? Aren’t conversations just simple decision trees?
That’s what this article is about — how the chatbot industry, while charting new territory, is rediscovering old problems in logic programming — namely how to scalably build and maintain large IF…THEN decision trees.
Stripped of their often beautiful developer user interfaces most chatbot frameworks ‘under the hood’ combine rocket science Natural Language Understanding (NLU) technology with Soyuz-era networks of chained-together IF…THEN statements. The NLU is used to extract intents and entities from free-text user utterances (questions and answers) and the chained-together IF…THEN statements are used to program conversational flow. The marketing makes it sound more glamorous, but if you’ve actually built a chatbot you’ll know this is what it boils down to 🙂
But what about AI? Where’s the AI in the above barebones description of chatbot frameworks? Well, the best chatbot frameworks use machine learning AI for NLU to accurately map user utterances to programmed intents and entities, regardless of how the user phrases their questions. This has been a huge step forward from keyword mapping or regular expressions. And the chained-together IF…THEN statements can proudly claim to be AI — the classic form of AI formerly known as logic programming.
To understand why chatbot decision trees get complicated fast have a quick scroll through a sample chat from my company’s eXvisory[mobile] device troubleshooter chatbot, diagnosing poor iPhone battery life. The chat session is essentially a scripted series of questions and answers, leading to a diagnosis. So it could be implemented in a decision tree, as sketched below.
At each node in the decision tree the system asks a typically multiple choice question and, based on the answer, branches to the next node. What could be simpler, right? But in our sample chat there were 11 diagnostic questions and 5 house-keeping questions (about the user and their device). Multiplying together the number of choices for each of the 11 diagnostic questions gives the number of possible answer combinations — a staggering 25,920 different ways that conversation could have gone! In fact, for troubleshooting mobile devices that wasn’t a particularly complicated chat session, and it had mostly binary [yes|no] answers. Add in a few more 3- or 4-choice questions and you soon have a decision tree with over a 100,000 nodes. That raw decision tree is going to look impenetrably complex in any user interface.
No. Most chatbot applications implement business processes that are either shallow (only a few questions and answers chained together) or naturally fit a simple decision tree. Consider a conversational business process that only requires 4 questions, each having on average 3 answers. That’s only 3 ⁴ = 81 decision tree nodes, which will need a fair bit of scrolling back and forth but is probably manageable. Or consider a business process I’ll call sieving, for example selecting a product from a catalogue. Select the product. Select the model. Select product attributes (like colour, storage, etc.). Rinse and repeat. This narrowing down of choices naturally fits a simple decision tree, although if your catalogue is large you’re going to have to programmatically build the decision tree from a database of products and product attributes.
There are also applications that fit sparse decision trees. This is where the chatbot business process involves multiple mini-conversations that are independent, but loosely linked together. Each mini-conversation is shallow, so only requires a small decision tree, and is independent of the other mini-conversations, so doesn’t depend on them logically. For example a “Would you like a weather forecast?” mini-conversation might ask your location and timescale then link to a mini-conversation about buying sun-cream.
But for complex business processes (requiring say 6 or more non-binary questions and answers) chatbot decision tree complexity is a BIG problem. In the product support example I used earlier there were more than 25,000 potential nodes in the decision tree. However, in real-world decision trees, many more questions than just the first question are reused in different contexts. For example while troubleshooting mobile device problems the question “Is device in flight mode?” is asked when diagnosing WiFi or cellular voice or cellular data or Bluetooth problems. So perhaps this redundancy can reduce the number of nodes in the decision tree?
It can, by turning the decision tree into a decision graph, where the answer to any question can lead to any other question, not just a single successor question in a branching decision tree (see above for a sketch of a decision graph fragment). But now the logic associated with each question is much more complex. In a decision tree each question branched to n successor questions, depending only on its answer, and didn’t have to explicitly take into account the questions that had been asked before. In a decision graph each question (if reused in different contexts) has to logically combine answers to previous questions to figure out which question to ask next. Its next question depends on the path taken to reach it as well as its answer. You can see this intuitively by realising that in a product support conversation you only want to be asked questions relevant to all your previous answers. So we’ve just swapped a huge decision tree with simple connections for a smaller decision graph with more complex connections, a “can’t see the wood for the trees” for a “giant bowl of spaghetti”. Both are incomprehensible at scale.
The fundamental problem with decision trees and decision graphs is that neither contains a semantic model of the underlying business process. Both are logic networks that can in theory model any logical process. And that’s precisely the problem. Because they can model anything they have no constraints or ‘shape’ to guide the developer to fit them to a specific business process. Returning to our product support example: an expert human fault finder has a sophisticated, high-level mental model that guides their approach to troubleshooting. That mental model may include prioritising questions that eliminate the largest number of faults from consideration, that scope the fault to a particular class of problems, that address more common problems, or have a lower cost associated with them (are easier to answer). That mental model is different for different business processes and is the essence of what we term ‘expertise’. ‘Experience’ is about individual questions but ‘expertise’ is knowing how to connect them together 🙂
There is nothing in the structure of naive decision trees or decision graphs to model this high-level or semantic model of a business process. At best it exists in the mind of the chatbot developer and hopefully in the form of copious comments within the decision tree or decision graph editor. But nothing in the decision tree or decision graph editor helps the developer discover or apply a high-level semantic model. Nothing prevents multiple developers from trampling all over each others’ mental models. Basically, welcome to non-trivial logic programming. It’s hard.
Oxford UK startup eXvisory.ai is all about how to scalably and maintainably program deep conversational logic into chatbots. Our product is a framework based on visual editors, specialised deep logic networks and constrained rule palettes. The idea is that chatbot developers choose rules from a constrained (and therefore comprehensible) palette and fit them into a logic network ‘shape’ optimised for specific problems (like fault diagnosis) in a manner reminiscent of solving a jigsaw puzzle. The network editor guides the developer in discovering and configuring new rules and keeps the resulting logic network nicely compartmentalised so that adding new rules doesn’t break existing ones. The developer simply presses a button to generate the engine code that implements the complex IF…THEN decision graphs that orchestrate a complex chatbot conversational flow.
For example, our eXvisory[mobile] demonstrator chatbot can find over a 100 non-trivial mobile phone or tablet problems, via chat sessions with an average of 16 questions (including house-keeping). The automatically generated Java logic code is 269KB. That’s a lot of IF…THEN statements and would be impossible to hand-code. But the code is easy to relate to the higher-level visual editor and with a Java debugger is super easy to debug!
To be notified of more deep logic articles follow me on my Medium profile page. Or say hello at firstname.lastname@example.org if you’d like access to our dev documentation, online tutorials, a web demo — or a free dev instance to build your own deep logic chatbot. If you enjoyed this article please recommend it to others by clapping madly (below) and sharing or linking to it. You can also leave feedback below.