Hint! TikTok is using NLP & NLU
Like many people, I downloaded TikTok at the beginning of quarantine on the recommendation of one of my friends. Immediately, the app figured out what I liked — parkour, fitness, mindfulness, but also science, skincare, and deadpan humor clips.
It was terrifying.
TikTok is owned by ByteDance, a Chinese internet technology company that was founded in 2012 by Yiming Zhang. ByteDance acquired Musical.ly, combining it with TikTok in August 2018, and TikTok as we know it was born. The app passed 2 billion downloads on the Play Store and App Store in late April.
User spending has escalated alongside the app’s growth, with China generating $331M in revenue, and the U.S. spending $86.5M, according to SensorTower. The pandemic has led it to grow even more, as people seek alternate forms of entertainment. See 1Q20 below.
TikTok can read its users . On the “For You” page of the app, the app’s landing page, you see a variety of different videos in all sorts of different categories. It decides what you like based on view-time (length of video watched) and interaction (liking, comment, following) and other variables for each of the videos that it shows you.
This is facilitated through a very powerful algorithm. The AI is based off creation, moderation, and interaction, as users upload videos, viewers watch them, and the cycle repeats.
For the user, it at first appears to be completely random content — until its not.
TikTok’s evaluation of a user boils down into the following variables (from what I can tell):
Preference and Personality
- What is the user watching on the platform?
- What do they prefer (time spend, interaction)?
Location and Environment
- Where is this user located?
- What day of the week and what time of day is it?
- What is the profile of this user?
- What group do they align with?
A key part of this evaluation is keeping the user in the app for as long as possible to gauge engagement and behavior. They also want to understand what you like, and build out a score profile of the above variables.
So for me, I am a 22-year old female located in the U.S., usually on the app at 9 pm or later, and I interact primarily with extreme sport videos, science, and skincare videos. They would score my watching according to that analysis.
The app also shows you ‘tester’ content — videos that are completely outside of the realm that it has designed for you. That’s how it learns what to share with a broader audience (and with you). If the viewer likes these tester videos, from relatively unknown content creators, it gets blasted out to other viewers, and bam, a viral video is created.
The below image is from AdAge, and it details how TikTok plans to target groups for advertisements. The algorithm works in a similar fashion. Who are these users, where are they, what are they using, and what do they like? All of those variables help to determine what TikTok will show you.
ByteDance, the parent company of TikTok, has an AI Lab that spans several different segments. They state “We are pushing the limits of machine intelligence each day by not only carrying out theoretical research, but our ideas can be practically tested and fast-tracked for product deployment“.
Most of these machine learning techniques help determine the success of a video. For a basic example, let’s say that when a new video is uploaded to TikTok, it’s analyzed with two systems:
- Natural Language Processing
- Computer Vision Technology
Natural Language Processing
NLP is a branch of AI that involves machine learning to understand and interpret human languages. The computer takes the text provided, breaks it down to extract the meaning behind the words, and collect data from it.
This is broken further down into lexical analysis, or examining the parts of speech. So here, we could look at this sentence: “Amazing dog rides a skateboard”
And break it down into the its individual components.
- Articles (DET): a
- Nouns: dog | skateboard
- Noun Phrase (NP): Article + Noun | Article + Adjective + Noun
- Verbs: rides | riding | rode
- Verb Phrase (VP): NP V | V NP
- Adjective (ADJ): amazing | amaze | amazed
Then begins syntactic analyis in which the computer would structure a set of rewrite rules to construct a parse tree. According to first order logic rule, if there is a Noun Phrase followed by a Verb Phrase, that constitutes a sentence.
- S = NP VP
- NP = DET N | DET ADJ
- VP = V NP
This creates the parse tree, as show below. This helps the computer break down the sentence to understand and process the content.
But how does the computer interpret this sentence, beyond simply breaking it down?
This is Semantics Analysis, or deciphering the meaning conveyed by the text. This consists of mapping the words to the objects in the knowledge base, as well as properly drawing parallels between the words in the sentence, and how they combine.
Dog would be d1, skateboard would be s1, and rides (d1, s1) and the computer could infer Wheels(x) if Rides(x) and thus could create an inference that the Wheels(s1). This is a very basic example, but the whole idea is that the computer interprets beyond the text, and assigns associations accordingly.
The same thing happens in TikTok — the hashtags, the metadata, certain keywords — it’s obviously quite a lot more sophisticated than that, but at the base level, TikTok breaks down what users are saying, categorizes that, and organizes it alongside other videos.
Computer Vision Technology
This is a field that focuses on ‘enabling computers to identify and process objects in images and videos in the same way that humans do.’ But this is a difficult task, as there is still uncertainty about how exactly the brain processes images.
The application here is a deep learning approach, which uses neural networks, feeding the system many examples of labeled data, allowing it to discern patterns and classify it for future use.
TikTok primarily uses facial feature detection, categorizing users that appear in the videos. It also recognizes objects (makeup brush, skateboard) etc and uses that to further classify the video.
The algorithm understands that there is a cat in the picture above (classification). It understands that those are all cat pixels, and can tell where they are in the picture (localization). Thus, when dog and duck are added, it can see that there are four objects in the image, and that they are different (object detection). It can then see that there are four objects, and these are the pixels that belong to each one (instance segmentation).
Evaluating the Video
This all comes together in TikTok by combining computer vision, NLP of the audio in the video, and meta data, such as the hashtags and description underneath the video.
Once the video is published, it gets evaluated based on these metrics. Exolyt has a current trending videos page, and just on first glance, you can see the use of metadata and hashtags to boost engagement. #shotgunfarmers helps to associate that video with the Shotgun Farmers game.
#neverfit in is an ad with Netflix for their new show, Never Have I Ever. The hashtag has had 9.6B views. 9.6 BILLION VIEWS.
Using metadata like a hash tag that has 9.6B views can help to boost video growth. But there are other things that are important in determining the success of a video.
First of all, TikTok seems to like users that stay true to their content. Staying within a vertical (like comedy, dance, creation/DIY) is rewarded more than being experimental. Also, shorter videos (~30 seconds) tend to perform better than longer videos.
The content creators hashtag and use ‘sounds’ to align their content with certain groups. A user can ride on the success of virality by using the sound of a viral video in their own video. There are themes that appear — doing certain challenges, dances, or telling a story using someone else’s dialogue.
The app tends to judge users from the first video that they post. The algorithm is designed to show that first video to more viewer to allow the new account to gain traction. The app rewards success of the first video too, as subsequent videos are more likely to get more reach if the first attempt did well.
Evaluating a Video
How is a video determined to be successful?
- Number of views — how many people get to see the video?
- Viewing Completion Rate — do users watch the video from start to finish or keep scrolling?
- Rewatch Rate — how many users watch it again?
- Engagement Rate — number of shares, comments or likes
- Collaboration — is it using the same sound / engagement tools as another video?
These variables (and more) gives users a score. If the video beats this score, the video gets boosted more. The video is also reviewed, checked “frame-by-frame by an AI for inappropriate content, copyright issues, etc” which determines if the video stays up or not.
Also, TikTok will retest content weeks after it is published. I have had several videos on my For You page that are from March (which is a bit startling, considering the massive changes in our daily lives since then). But TikTok doesn’t show you the date — it just retests them to retest the AI and the key metrics on the video.
The higher the user score, the more likely their videos are to get boosted up. Along with all social media platforms, there is also has a component of dopamine — users who get hundreds of thousands of views on their videos are going to come back and post again. And viewers, who enter into a content bubble as the platform filters according to their interests, keep on coming back.
That’s how TikTok added 12M users in March 2020, a 48.3% growth from January 2020.
People spend a lot of time on TikTok too. On average, most users spend ~30 minutes a day on Instagram, a bit less on Snapchat, and most of their time on Facebook. But TikTok users spend 46 minutes on the app per day.
That’s quite powerful. By combining an app that feels homegrown, and encourages creativity and collaboration, the user base has grown, and users from all ages are joining the app. Hollywood Reporter also noted that it “lacked all the hallmarks of corporatization” and it doesn’t feel like users are being sold to. TikTok plans to engage more advertisers moving forward, as outlined in this pitch deck from AdAge.
Conclusion: The Algorithm Works
There is a reason that people keep downloading TikTok and stay — it’s because the app can read you. It knows what you like, and you don’t explicity tell it anything about you. It almost feels like magic — it’s an infinite feed of content, without distracting ads like YouTube, and a more ‘swipable’ UI as compared to Instagram.
Fernando Comet wrote a really interesting article here comparing the UX of Instagram versus TikTok. Instagram is a lot more ad-focused, and has less interaction options on the main page. On Tik Tok’s main page, a user has more than ten options — they can like, comment, share, imitate, see the audio played etc. Instagram, you can like and comment, and watch stories.
Broadly speaking, the algorithm more or less does what it is supposed to do — engage users. But there is also the worries of filter bubbles, where users don’t see things that are outside of their realm of comfort. There is the also the issue of ideological, extreme views being exacerbated by the algorithm, which is problematic. Becca Lewis, a researcher for Data and Society says,
“If you keep getting content back in a direction that you’re happy with, then a complacency develops where you can just keep getting fed content without thinking of why that content is being placed in front of your eyeballs specifically.”
Source: Becca Lewis
A powerful algorithm must be balanced with cautious usage. Because no one really knows how any of the top social platforms algos work, it’s important to be aware of what you are engaging with, especially in the age of endless content.
Disclaimer: None of this is investment advice and is an analysis about the underlying algorithm. I have no affliation with any social media company.