We worked full-time on our solution. We had 8 days to prototype a solution to our business problem, through an iterative, Agile approach structured around short sprints culminating in a retrospective and backlog review.
So as not to overwhelm ourselves with the high volume of CRA forms that exist, of the 1045 CRA forms, for the purpose of our prototype, we focused our design efforts on 8 forms that we felt impacted our stakeholders most (they are also some of the most common forms that the CRA receives)
- POAs (Power of Attorney)
- T2201 (Disability Tax Credit Application)
- Death Certificates
- Family Court Documents
- T1261 for Newcomers (Apply for Tax Number)
- Child Benefits Forms
- Authorization forms
We began prototyping our voice user interface, using a user-centred, agile approach. We revisited our brainstorming statement from earlier:
“What might we create for Canadians that helps them find the correct mailing address for their forms?”
We refined our statement even further, this time, applying the How Might We approach, thus making it more specific and inclusive of all our user groups. As a team, we asked ourselves:
“How might we develop a system that is iterative, responsive, and inclusive to help Canadians find the correct mailing address for their forms and documents?
Testing with someone from our target user group
We built a simple prototype and put it in front of someone from our user group. It wasn’t perfect by any stretch of the imagination, but that wasn’t the point. The point was merely to test our assumptions through this physical manifestation of our idea.
Using a watered-down Wizard-of-Oz method, we tested our simple prototype on a user, with one of us acting out the “role” of the voice bot and following a preliminary list of key questions that the Bot would and should ask:
- How can I help you today?
- Okay. Please say your form number or form name.
- Which province do you live in?
- Would you like me to text you the address?
- Do you have anymore forms that you would like to mail?
One thing that is common in all kinds of design is the use of scenarios, and designing for those scenarios. The purpose of a good scenario is to bring an experience to life, and help users come into your world and allow them to experience it.
We tested the following scenario with users. With this scenario, users had to put themselves into the hypothetical world of one of our key user groups, newcomers.
Scenario 1: You’re a newcomer and your first language is not English or French. You need to mail a completed CRA form but you don’t know the address. Determine the correct mailing address for your form.
- Users wanted the ability to submit the form electronically (through MyCRA or My Account) — This was not doable given the time constraints
- The Bot’s database should include forms and documents — This was totally doable!
I needed to familiarize myself with the Amazon VUI options that we had at our disposal. Between building an Alexa skill vs. building something more robust with Lex Bot, I needed to understand which would serve us better by weighing the pros and cons of both.
Alexa skill or Lex Bot? That was the question.
Alexa Skills are similar to Amazon Lex voice bots. Both let you create voice-activated digital assistants.
Alexa is a consumer service available to the general public. Anyone can build a custom Alexa skill and add their voice assistant features to the same Alexa ecosystem and devices that people use around the world. One of the downsides to this approach is the fact that Alexa skills already come equipped with the pre-built Alexa personality. It’s not a blank canvas.
Amazon Lex bots, on the other hand, is an Amazon web service that you can use to build your own private specialized voice assistant that acts much like Alexa, without its default behaviour or even necessarily being publicly available. It supports both voice and text and can be deployed across mobile and messaging platforms.
So, with this in mind, we decided to focus our efforts on building the voice bot ourselves. As such, we settled on our solution: We would design a conversational voice bot using Amazon’s Lex Bot service.
This way we’d have full control over shaping the bot and designing its personality and traits. That means, no wake words, no default behaviour from Alexa. We wanted to create a brand new experience for Canadians, one that possessed the natural conversational qualities of Alexa, but wasn’t Alexa. Building a custom Alexa skill would be an added feature for a later date.
It was quick and easy to get started, and the relatively low start up costs made the Lex platform appealing. There were also significant short-term and long-term cost benefits of using AWS Lambda. The AWS Free Tier alone includes 1 million free requests and up to 3.2 million seconds of compute time per month with AWS Lambda. This is why, within a matter of two days, we had the shell (i.e., the basic interaction model) for our voice bot up and running, and to-date, we have spent just under $10 to have the Bot running.
The obvious drawback with chatbots is their rigidity. Chatbots more or less follow a script, meaning user experiences and interactions are limited by the intents and utterances of the user, which will naturally follow the general logic flow that is mapped out. Making the bot more conversational (at least within the linear confines of a non-AI enabled chatbot structure) was a priority. As such we embarked on creating for it what every human character operates on: a personality.
As we only had 12 days to come up with a working prototype, I focused on the following 3 core system elements which guided my voice design process:
- Making it conversational, natural, and personable
- Error prompts
These system elements would serve as the foundation for our Bot’s personality and its conversational ability.
What’s a persona anyway? It’s the face you show to the world. Within VUI design, personas keep the conversation writing consistent and clear and help define the communication style of the voice interface.
We developed a system persona specifically designed to highlight the personality of the voice. We knew we wanted it to be both conversational and personable. Users would be relying on this bot for important information so the interaction needed to contain elements of ease and be relatable. In doing so, this would out users at ease and allow for smoother, light-hearted interaction.
Now that our system had a personality and I had fleshed out a collection user stories which pair the user needs to the system features, I moved onto designing the logical components of the conversation and mapping out the flow.
I developed the system scripts in the form of a chat tree diagram. Using Google sheets, I mapped out the system prompts and responses, along with sample user utterances. The script document is organized by intents based on what the user says and the bot replies, or what the bot says and what the user replies.
View the full script document here.
The next step was to form statements that the Bot could take in as inputs. Based on the sample dialogs, the following intents capture all the initial core functionalities:
- Get Address Intent
- Back Word Intent
- Next Word Intent
- Repeat Word Intent
- Restart Spelling Intent
- Spell Intent
- Stop Spelling Intent