Echo Dot (3rd Gen) - Smart speaker with Alexa - Charcoal

Use your voice to play a song, artist, or genre through Amazon Music, Apple Music, Spotify, Pandora, and others. With compatible Echo devices in different rooms, you can fill your whole home with music.

Buy Now

Wireless Rechargeable Battery Powered WiFi Camera.

Wireless Rechargeable Battery Powered WiFi Camera is home security camera system lets you listen in and talk back through the built in speaker and microphone that work directly through your iPhone or Android Mic.

Buy Now

Build a custom speech-to-text model with speaker diarization capabilities – Build Smart. Build Secure. IBM Developer



In this code pattern, learn how to train a custom language and acoustic speech-to-text model to transcribe audio files to get speaker diarized output when given a corpus file and audio recordings of a meeting or classroom.


One feature of the IBM® Watson™ Speech to Text service is the capability to detect different speakers from the audio file, also known as speaker diarization. This code pattern shows this capability by training a custom language model with a corpus text file, which then trains the model with ‘Out of Vocabulary’ words as well as a custom acoustic model with the audio
files, which train the model with ‘Accent’ detection in a Python Flask run time.

After completing the code pattern, you understand how to:

  • Train a custom language model with a corpus file
  • Train a custom acoustic model with audio files from the bucket
  • Transcribe the audio files from the bucket and get a speaker diarized textual output
  • Store the transcript in the bucket


Custom speech-to-text model diarization flow

  1. The user uploads a corpus file to the application.
  2. The extracted audio from the previous code pattern is retrieved from IBM Cloud Object Storage.
  3. The corpus file as well as the extracted audio are uploaded to the Watson Speech To Text service to train the custom model.
  4. The downloaded audio file from the previous code pattern is transcribed with the custom speech-to-text model, and the text file is stored in IBM Cloud Object Storage.


Get detailed instructions in the README file. Those steps explain how to:

  1. Clone the GitHub repository.
  2. Create the Watson Speech to Text service.
  3. Add the credentials to the application.
  4. Deploy the application.
  5. Run the application.

This code pattern is part of the Extracting insights from videos with IBM Watson use case series, which showcases the solution on extracting meaningful insights from videos using Watson Speech to Text, Watson Natural Language Processing, and Watson Tone Analyzer services.

Read More


Please enter your comment!
Please enter your name here