It’s like voice-enabling an API
The first thing you need to do is list out the phrases a user could reasonably expect to speak to your application. This can be a paralyzing process because you need to put some thought into it.
Let’s look at the SpaceX API to see this in practice.
You can paste this endpoint in a browser and it will return every launch since 2006. That’s fine if a user says ‘show me all launches’…but we can provide more richness to the experience. Looking at the API docs, you can see it will filter by launch date and the rocket used for the mission. Immediately you can support these sorts of phrases
Find Launches [this year]
Show me all [Falcon Heavy] launches [last year]
List all [upcoming] [Falcon 9] launches
Only show [previous] launches
There would be more variation, but you quickly realize what your inputs are. In this case, let the user filter the API request by these 4 inputs: launchDate, rocket, upcoming, and previous.
It’s our job to process natural language user queries into logical visual and audio responses. Breaking down the interactions, I came up with this model for queries to the API for a single intent: GetLaunches
Now, if you’ve never built an app for Bixby…that’s fine. Just reading the syntax for the ‘GetLaunches’ action you can see how this works. What you’re saying in the code is that the user doesn’t need to provide any of the inputs (they’re optional by default), and you want to bring back launch information: output(Launch).
If you look at an input like ‘LaunchDate’…that is functioning as a natural language date. You can say ‘next Tuesday’, ‘May 20th 2020’ or ‘last year’. The platform understands this and converts it into a format that a developer can work with like ‘
2010-12-08T15:43:00.000Z’. The same holds true on other platforms. You may have also noticed some code about ‘default-select’. That’s a way of telling Bixby to pick the first date instead of prompting the user to disambiguate (fine for our purposes).
Now that you’ve modeled an interaction, we can assign some language to it and bring it to life!
It’s important to provide clean input, so you only want to tag words in the phrase that will become inputs for your action. In this case, we want to tag the SpaceX rockets: Falcon 1, Falcon 9, Falcon Heavy, and Starship (coming in 2021). We also provide training entries with dates (next month), and no inputs — ‘Show SpaceX launches’. The number of examples you use for training really depends on your use case. Provide ‘as many as you need’, meaning…you’ll be testing your model to see if it’s correctly parsing the language and routing it to the correct intent while also parsing the inputs.
Good developers and designers have empathy for their users. Understanding conversational experiences create ambiguity is critical to building intuitive interactions that serve the user. Think of these interactions like a journey. At first, they need lots of cues to know what they can and cannot do. Eventually, you can take off the training wheels and introduce more functionality as they become more comfortable.
Consider the user can say ANYTHING, so how do you handle unbounded input?
Some rules to apply:
- Build a default interaction that will be activated if the user says ‘help’, ‘start over’, ‘how does this work’, etc. Bonus points if you add a contextual response to this?
- If you collect input from the user, ask them very naturally without being too verbose. Expand your explanation when they’re having trouble. Be dynamic, don’t repeat application responses. Instead, alter their phrasing when re-prompting.
- Try to ground the conversation. We do this as humans. Sometimes we get off course, and we want to get back to a productive exchange. When we don’t understand the user, or they provide no input…consider giving them an explanation of what’s happening and why. Give them the option to exit, so no interactions have a dead end. For example, “Looks like I’m having trouble collecting your preference, would you like to continue?”.
Part 2 — Stay tuned, I’ll be posting it soon.