Social media platforms like Facebook, Twitter, and YouTube have been making significant investments in the development of artificial intelligence to moderate content and automate the removal of harmful posts. These decision-making technologies typically rely on machine learning techniques and are specific to types of content, such as images, videos, sounds, and written text. Some of these AI systems, developed to measure the “toxicity” of text-based content, make use of natural language processing and sentiment assessment to detect harmful text.
While these technologies may appear to represent a turning point in the debate around hate speech on the internet, recent research has shown that they are still far from being able to distinguish context or intent. If such AI tools are entrusted with the power to police content online, they have the potential to suppress legitimate speech and censor the use of specific words, particularly by vulnerable groups.
At InternetLab, we recently conducted a study focused on Perspective, an AI technology developed by Jigsaw (owned by Google’s parent company, Alphabet). The AI measures the perceived level of “toxicity” of text-based content. Perspective defines “toxic” as “a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion.” Accordingly, the AI model was trained by asking people to rate internet comments on a scale from “very healthy” to “very toxic.” The level of perceived toxicity indicates the likelihood that a specific comment will be considered toxic.
We used Perspective’s API to compare the perceived levels of toxicity of well-known drag queens and far-right political figures. The study compared the Twitter accounts of all the former participants of RuPaul’s Drag Race with those of far-right leaders such as David Duke, Richard Spencer, Stefan Molyneux, and Faith Goldy. Additionally, we included prominent non-LGBTQ Twitter users, including Donald Trump and Michelle Obama. We analyzed over 114,000 tweets posted in English with Perspective’s most recent version.
Our results indicate that a significant number of drag queen Twitter accounts were calculated to have higher perceived levels of toxicity than white nationalist leaders. On average, the toxicity levels of the drag queens’ accounts ranged from 16.68 percent to 37.81 percent, while the white nationalists’ averages spanned from 21.30 percent to 28.87 percent. The toxicity level of President Trump’s Twitter account was 21.84 percent.
We also ran tests measuring the toxicity levels of words commonly found in the tweets of drag queens. These words had significantly high levels of toxicity: gay (76.10 percent), lesbian (60.79 percent), queer (51.03 percent), and transvestite (44.48 percent). That means that, regardless of context, neutral words such as “gay,” “lesbian,” and “queer” were ranked as significantly “toxic” by Perspective’s AI. This indicates important biases in Perspective’s tool.
Additionally, words such as “fag” (91.94 percent), “sissy” (83.20 percent), and “bitch” (98.18 percent) registered high levels of toxicity. Though those words might be commonly perceived as harmful, their use by members of the LGBTQ community most often serves a different purpose.
Drag queens can be sharp-tongued. From “reads”—a specific form of insult that acerbically exposes someone’s flaws—to harsh jokes and comebacks, drag queens often reclaim words traditionally used as slurs to build a distinctive communication style.
In person, it is easier to understand context and see this as a form of self-expression. But when reading such missives online, it is significantly more challenging to distinguish between harmful and legitimate speech—especially when that assessment is made by machines. These in-group uses were also found in various tweets we analyzed. But in many of those cases, Perspective still deemed the post extremely toxic:
Level of toxicity: 95.98 percent
Level of toxicity: 91.16 percent
Often times, these “harsh” interactions address sensitive topics like sexual roles in relationships, the visibility of gayness, and sexual promiscuity—subjects usually explored by those who aim to verbally attack LGBTQ people.
But when directed at each other by members of the LGBTQ community, these comments may come from a place of solidarity, not malice. The underlying messages do not promote hate, prejudice, and discrimination. On the contrary, they often evoke pride and self-acceptance, helping LGBTQ people cope with outside hostility.
Hate speech is often predicated on underlying messages, as well. When subtext promotes hateful or discriminatory ideas, it represents a threat for marginalized and vulnerable groups. By training its algorithm to learn what content is likely to be considered toxic, Perspective’s tool seems to be giving more prevalence to words, rather than their underlying messages.
Though the ideas promoted by white nationalist tweets may target vulnerable groups, Perspective’s AI often categorized them as much less toxic than the drag queens’ tweets:
Level of toxicity: 7.17 percent
Level of toxicity: 6.78 percent
Level of toxicity: 21.7 percent
If this AI tool were empowered to decide which tweets should be removed, many of the drag queens’ posts would be suppressed. In fact, Perspective is already making such decisions.
In March, Jigsaw launched Tune, an experimental browser plugin that uses Perspective to let users set the “volume” of online content on platforms including Facebook, Twitter, YouTube, and Reddit. Users are able to turn a knob up to see all posts, or turn it down to hide all toxic comments. Those posts are replaced with small colored dots. Tune markets itself around the idea that “abuse and harassment take attention away from online discussions.” Tune claims that by using Perspective, it “[helps] you focus on what matters.”
The problem: such AI tools may be developed using biased training data, posing threats to the self-expression and visibility of vulnerable groups. Taken individually, 3,925 tweets from drag queens—around 3.7 percent of the total amount of analyzed tweets—would have been all hidden from Tune users on “keep it low” mode, according to our research.
The use of Perspective and other similar technologies could thus be mistakenly used to police and censor legitimate LGBTQ speech on online platforms. If AI tools focus on misleading signals—such as the use of specific words, rather than a message’s intent—such models will make little progress in removing hate speech.
AI tools have the potential to shape the way we communicate. If computers indiscriminately decide what is “toxic,” tech has the power to both impact our modes of expression online and severely limit the inclusiveness of the internet.