Adding deep logic to Google DialogFlow
This is the article #4 in a series about the challenges of adding deep logic (expertise) to product support chatbots. I work for eXvisory.ai, an Oxford UK startup that provides visual dev tools to add deep logic to chatbots.
My article Chatbot Decision Trees: seriously, how hard can they be? seems to have struck a chord with chatbot developers. It showed that non-trivial conversations are extremely difficult to orchestrate because they require decision graphs of thousands of interconnected IF…THEN statements — that are impossible to understand and manage — which is why most chatbots are still so shallow (shop assistants rather than experts).
This article shows how Oxford UK startup eXvisory.ai solves this complexity problem by adding deep logic networks to natural language.
Top 4 Most Popular Bot Design Articles:
What’s the goal?
I’ll show you how we built a ‘deep logic’ phone troubleshooter bot using eXvisory with Google DialogFlow (published to Google Assistant, Telegram and Kik) but our hybrid approach is applicable to IBM Watson, Amazon Lex, Microsoft Bots, SAP Recast or any webhook-capable framework.
The goal is to build a chatbot to guide users through deep, systematic troubleshooting of mobile phone or tablet problems. What do I mean by deep? A typical troubleshooting happy path requires at least a dozen troubleshooting questions and answers, with hundreds of potential questions depending upon all the previous answers. There isn’t just a handful of happy paths — there are thousands!
We also want to leverage the Natural Language Understanding (NLU) capabilities of Google DialogFlow to make the user interaction less scripted and more two-way and DialogFlow’s integration capabilities to deploy our bot to Google Assistant and the Telegram and Kik messaging platforms.
How hard can this be?
Incredibly hard. To see why let’s see how Google DialogFlow orchestrates conversational flows. For brevity I’m going to assume you understand standard chatbot terms: such as utterances (what the user types or says), intents (what the user means), entities (items referred to in utterances and intents) and contexts (data captured by prior intents).
Google DialogFlow orchestrates conversational flows using contexts. For example if I want to link from an intent that asks “How would you rate the location of the hotel?” to a second intent that asks “How would you rate the facilities at the hotel?” I could attach a context called facilities-question as an output from the first intent and input to the second intent. The second intent will then only match utterances where the the facilities-question context is present, which means answers to the question asked by the first intent. If you’re used to other frameworks this might seem a little indirect, compared to explicit links between intents, but it’s very flexible and natural — matching intents to conversational context rather than other intents.
This is what it looks like in DialogFlow (this is their official example for conversational flow). To connect 2 questions together you have to add 9 intents: 1 for the first question, 4 for each of its possible answers and 4 more for the second question’s answers. If the second question was different for each of the first question’s answers it would require 21 intents!
It’s exponential and there’s a formula for it.
If I have 2 questions, each of which has 4 possible answers, this gives a requirement of 21 intents (try it on Google’s calculator). But if I want to model a typical troubleshooting conversation (say 12 questions, which on average have 3 possible answers) it’s 797,161 intents. Wow. In practice you wouldn’t need all those intents, because the conversation wouldn’t diverge as extremely, but you would still require thousands of intents. Which is a big problem when DialogFlow only shows 20 intents at a time!
How to solve this complexity problem?
The good news is that orchestrating complex logical flows is not a new problem. It was the focus of the last AI boom and eXvisory.ai is an Internet reboot of state-of-the-art deep logic AI from that boom. A great way to intuitively understand our approach is to compare logic editors.
Many chatbot frameworks have fancier conversational logic editors than DialogFlow. The one above from BotStar looks great (I haven’t tried it) but as you can see it models the same raw decision tree of intents. It helps a lot with comprehension, up to a point — but there are only six intents on-screen. For deep logic conversations requiring tens of thousands of intents (like our troubleshooting example) this kind of editor is like peering through a tiny window into a gigantic decision forest.
eXvisory’s editors model logic at a deeper level, capturing the way human experts think. The editor shown above is optimised for product support. It’s a fault containment hierarchy. Developers place faults within fault groups, select from a palette of simple rules (faults, eliminators and tests) and fit these rules into the hierarchy — a lot like solving a jigsaw puzzle. Conversation flows from left-to-right and top-to-bottom, asking the right questions in the right order and providing helpful explanations.
Deep logic networks encode complex logic much more efficiently. A deep logic network with a few hundred rules corresponds to tens of thousands of interconnected IF…THEN statements or intents — making it far easier to extend and maintain at scale. And because the editors directly model deep logic (in this case the process of elimination) they naturally guide the developer. Complexity increases linearly, which is a fancy way of saying that adding lots of logic doesn’t lead to complexity meltdown. The eXvisory network above, with 630 logic rules, is nearly as easy to comprehend, extend and maintain as when it had 63 rules.
So why use DialogFlow?
Press a button and the eXvisory deep logic editors generate cloud-hosted web chatbots. But if you try out our pilot mobile phone or tablet troubleshooter or view some sample chats you’ll notice they don’t accept natural language inputs — they are completely scripted — and can’t be accessed from apps like Google Assistant, Telegram or Kik.
Why integrate? Natural language is definitely not an end in itself (it adds a LOT of complexity and is very hard to do well) but it can make conversations more efficient if the user can say what they want in their own words. DialogFlow adds this natural language understanding (NLU) plus one-click deployment of our bot to Google Assistant, Telegram and Kik.
Putting it all together
Connecting all the pieces together is surprisingly easy, through the magic of webhooks (which are just well-defined HTTP callbacks).
Most assistant and messaging apps define webhook APIs (typically HTTP POSTs with well-defined JSON payloads), which they can call out to in order to obtain additional information or interact with bots. DialogFlow knows how field webhook requests from popular apps and translate them to DialogFlow intents. So a Telegram user may enter some text, which is sent to a DialogFlow-hosted webhook, triggers a DialogFlow intent and returns its associated response (shown above). This integrations layer is one of the main capabilities of DialogFlow and other frameworks, along with Natural Language Understanding (it’s nowhere near as clever as NLU).
DialogFlow defines its own webhook API so it too can call out to external systems. We implemented this webhook and connected it to the REST API generated by our deep logic editors and used by our web chatbots. So now a DialogFlow intent triggered by text from Telegram or Google Assistant can call out to eXvisory for a deep logic response.
The resulting eXvisory-DialogFlow integration is interesting, because it’s a hybrid of natural language and deep logic.
We use NLU to model the early conversation in a more efficient and natural manner. Without NLU we have to ask a scripted series of questions to establish the make, model and operating system of the device we are trying to troubleshoot. With NLU we can ask “What device are you having problems with?” and then use an intent to match “iPad 3” and infer that the make is “Apple”, the model is “iPad 3”, it’s a “tablet” (not a phone) and that it runs the “iOS” operating system. During this introductory phase we store all the captured data in a conversational context called “context_triage”. Likewise, instead of asking the user to select from a laundry list of problem symptoms we let them describe their symptoms and match it to intents.
Once the conversation reaches the point where it’s time for the expert troubleshooter to take over we copy the “context_triage” data into a new context called “context_exvisory”. From then on most user responses match special eXvisory intents and are forwarded to the eXvisory-DialogFlow webhook so that conversation follows the deep logic troubleshooting paths created using the eXvisory visual editors. The “context_exvisory” context is passed to the webhook so that eXvisory does not reprompt for answers the user has already given.
NLU is also used for digression from eXvisory troubleshooting paths: so the user can ask the bot to explain or repeat the last question, to go back to the previous question or to restart the troubleshooting session.
Is it worth it?
Absolutely. The hybrid eXvisory-DialogFlow bot has 65 intents (and 630 eXvisory deep logic rules) but it can already troubleshoot well over a 100 commonly encountered but hard-to-diagnose problems with mobile phones and tablets. The deep logic rules took about six person-months to build, despite most troubleshooting ‘happy paths’ requiring over a dozen questions and answers. Implementing this level of expertise using chained-together intents or IF…THEN statements would be almost impossible.
At eXvisory we’re pioneering a new type of capability to be added to frameworks like Google DialogFlow, IBM Watson, Amazon Lex, Microsoft Bots and SAP Recast — the capability to add deep logic skills into the mix, alongside NLU and multi-channel integration. It’s the key enabler for AI chatbots to step up from simple applications to professional-level skills, like product support, financial advice or medical diagnostics.
Out hybrid bots are alpha versions (so don’t be upset if they screw up or can’t fix your phone or tablet problem) but you can try them out:
- Google Assistant (iOS and Android): Talk to Mobile troubleshooter
- Telegram: t.me/exvisory_mobile_bot
- Kik: kik.me/exvisory_mobile (not visible in Kik store)
Build your own deep logic chatbot
To be notified of more deep logic articles follow me on my Medium profile page. Or say hello at firstname.lastname@example.org if you’d like access to our dev documentation, online tutorials, a web demo — or a free dev instance to build your own deep logic chatbot. If you enjoyed this article please recommend it to others by clapping madly (below) and sharing or linking to it. You can also leave feedback below.
Don’t forget to give us your 👏 !