Echo Dot (3rd Gen) - Smart speaker with Alexa - Charcoal

Use your voice to play a song, artist, or genre through Amazon Music, Apple Music, Spotify, Pandora, and others. With compatible Echo devices in different rooms, you can fill your whole home with music.

Buy Now

Wireless Rechargeable Battery Powered WiFi Camera.

Wireless Rechargeable Battery Powered WiFi Camera is home security camera system lets you listen in and talk back through the built in speaker and microphone that work directly through your iPhone or Android Mic.

Buy Now

How Much Data Do You Need To Train A Chatbot and Where To Find It?


Chris Knight

Most providers/vendors say you need plenty of data to train a chatbot to handle your customer support or other queries effectively, But, how much is plenty, exactly? We take a look around and see how various bots are trained and what they use.

Recent bot news saw Google reveal its latest Meena chatbot (PDF) was trained on some 341GB of data. Meena is “a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations.” So, a few steps beyond your usual scripted bot or even those that claim AI smarts.

The 38-page scientific paper highlights the advanced nature of Meena, but any business looking to train its bot faces that same opening question, how much training is enough?

For a very narrow-focused or simple bot, one that takes reservations or tells customers about opening times or what’s in stock, there’s no need to train it. A script and API link to a website can provide all the information perfectly well, and thousands of businesses find these simple bots save enough working time to make them valuable assets.

But when you have a bot that needs to answer a range of questions that people could ask in a very wide range of phrases, training becomes essential to teach the bot to understand what is being asked of it through natural language programming (NLP).

These bots can be trained through data you already have in the business, perhaps digitised call centre transcripts, email or Messenger requests and so on to provide intent variation, classification and recognition. To see how data capture can be done, there’s this insightful piece from a Japanese University, where they collected hundreds of questions and answers from logs to train their bots.

KLM used some 60,000 questions from its customers in training the BlueBot chatbot for the airline. Businesses like Babylon health can gain useful training data from unstructured data, but the quality of that data needs to be firmly vetted, as they noted in a 2019 blog post.

Others have to go further out of their way to find unique information to deliver top notch customer service. The developers of the Rose chatbot at the Las Vegas Cosmopolitan Hotel took the time to “0ver the course of 12 weeks, we met with every department within The Cosmopolitan to learn the secrets and surprises the typical guest wouldn’t find on their own. With a ton of information, the team leveraged the user experience team to identify key conversation categories that would help guests experience the property through their interests.”

Or, you can buy-in data suitable for your vertical or market, using services like Lionbridge who provide business-focused data across a broad range of categories. There are also many popular datasets available to any business include:

Whatever your chatbot, finding the right type and quality of data is key to giving it the right grounding to deliver a high-quality customer experience. With the right data, you can train chatbots like SnatchBot through simple learning tools or use their pre-trained models for specific use cases.

Clearly, the more data you have the better, and if it can be provided as entities and intent, or similar identifiers, the better, but even raw data can be useful in training bots when it comes to helping customers.

Hopefully, this gives you some insight into the volume of data required for building a chatbot or training a neural net. The best bots also learn from new questions that are asked of them, either through supervised training or AI-based training, and as AI takes over, self-learning bots could rapidly become the norm.

Read More


Please enter your comment!
Please enter your name here