The 21st century has opened up a boundless mass of headlines, articles, and stories. This information influx, however, is partially contaminated: Alongside factual, truthful content is fallacious, deliberately manipulated material from dubious sources. According to research by the European Research Council, one in four Americans visited at least one fake news article during the 2016 presidential campaign.
This problem has recently been exacerbated by something called “automatic text generators.” Advanced artificial intelligence software, like OpenAI’s GPT-2 language model, is now being used for things like auto-completion, writing assistance, summarization, and more, and it can also be used to produce large amounts of false information — fast.
To mitigate this risk, researchers have recently developed automatic detectors that can identify this machine-generated text.
However, a team from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) found that this approach was incomplete.
To prove this, the researchers developed attacks that they showed could fool state-of-the-art fake-news detectors. Since the detector thinks that the human-written text is real, the attacker cleverly (and automatically) impersonates such text. In addition, because the detector thinks the machine-generated text is fake, it might be forced to also falsely condemn totally legitimate uses of automatic text generation.
But how can the attackers automatically produce “fake human-written text”? If it’s human-written, how can it be automatically produced?
The team came up with the following strategy: Instead of generating the text from scratch, they used the abundance of existing human-written text, but automatically corrupted it to alter its meaning. To maintain coherence, they used a GPT-2 language model when performing the edits, demonstrating that its potential malicious uses are not limited to generating text.
“There’s a growing concern about machine-generated fake text, and for a good reason,” says CSAIL PhD student Tal Schuster, lead author on a new paper on their findings. “I had an inkling that something was lacking in the current approaches to identifying fake information by detecting auto-generated text — is auto-generated text always fake? Is human-generated text always real?”
In one experiment, the team simulated attackers that use auto-completion writing assistance tools similar to legitimate sources. The legitimate source verifies that the auto-completed sentences are correct, whereas the attackers verify that they’re incorrect.
For example, the team used an article about NASA scientists describing the collection of new data on coronal mass ejections. They prompted a generator to produce information on how this data is useful. The AI gave an informative and fully correct explanation, describing how the data will help scientists to study the Earth’s magnetic fields. Nevertheless, it was identified as “fake news.” The fake news detector could not differentiate fake from real text if they were both machine-generated.
“We need to have the mindset that the most intrinsic ‘fake news’ characteristic is factual falseness, not whether or not the text was generated by machines,” says Schuster. “Text generators don’t have a specific agenda — it’s up to the user to decide how to use this technology.”
The team notes that, since the quality of text generators is likely to keep improving, the legitimate use of such tools will most likely increase — another reason why we shouldn’t “discriminate” against auto-generated text.
“This finding of ours calls into question the credibility of current classifiers in being used to help detect misinformation in other news sources,” says MIT Professor Regina Barzilay.
Schuster and Barzilay wrote the paper alongside Roei Schuster from Cornell Tech and Tel Aviv University, as well as CSAIL PhD student Darsh Shah.
Bias in AI is nothing new — our stereotypes, prejudices, and partialities are known to affect the information that our algorithms hinge on. A sample bias could ruin a self-driving car if there’s not enough nighttime data, and a prejudice bias could unconsciously reflect personal stereotypes. If these predictive models learn based on the data they’re given, they’ll undoubtedly fail to understand what’s true or false.
With that in mind, in a second paper, the same team from MIT CSAIL used the world’s largest fact-checking dataset, Fact Extraction and VERification (FEVER), to develop systems to detect false statements.
FEVER has been used by machine learning researchers as a repository of true and false statements, matched with evidence from Wikipedia articles. However, the team’s analysis showed staggering bias in the dataset — bias that could cause errors in models it was trained on it.
“Many of the statements created by human annotators contain giveaway phrases,” says Schuster. “For example, phrases like ‘did not’ and ‘yet to’ appear mostly in false statements.”
One bad outcome is that models trained on FEVER viewed negated sentences as more likely to be false, regardless of whether they were actually true.
“Adam Lambert does not publicly hide his homosexuality,” for instance, would likely be declared false by fact-checking AI, even though the statement is true, and can be inferred from the data the AI is given. The problem is that the model focuses on the language of the claim, and doesn’t take external evidence into account.
Another problem of classifying a claim without considering any evidence is that the exact same statement could be true today, but be considered false in the future. For example, until 2019 it was true to say that actress Olivia Colman had never won an Oscar. Today, this statement could be easily refuted by checking her IMDB profile.
With that in mind, the team created a dataset that corrects some of this through de-biasing FEVER. Surprisingly, they found that the models performed poorly on their unbiased evaluation sets, with results dropping from 86 percent to 58 percent.
“Unfortunately, the models seem to overly rely on the bias that they were exposed to, instead of validating the statements in the context of given evidence,” says Schuster.
Armed with the debiased dataset, the team developed a new algorithm that outperforms previous ones across all metrics.
“The algorithm down-weights the importance of cases with phrases that were specifically common with a corresponding class, and up-weights cases with phrases that are rare for that class,” says Shah. “For example, true claims with the phrase ‘did not’ would be upweighted, so that in the newly weighted dataset, that phrase would no longer be correlated with the ‘false’ class.”
The team hopes that, in the future, combining fact-checking into existing defenses will make models more robust against attacks. They aim to further improve existing models by developing new algorithms and constructing datasets that cover more types of misinformation.
“It’s exciting to see research on detection of synthetic media, which will be an increasingly key building block of ensuring online security going forward as AI matures,” says Miles Brundage, a research scientist at OpenAI who was not involved in the project. “This research opens up AI’s potential role in helping to address the problem of digital information, by teasing apart the roles of factual accuracy and provenance in detection.”
A paper on the team’s contribution to fact-checking, based on debiasing, will be presented at the Conference on Empirical Methods in Natural Language Processing in Hong Kong in October. Schuster wrote the paper alongside Shah, Barzilay, Serene Yeo from DSO National Laboratories, MIT undergraduate Daniel Filizzola, and MIT postdoc Enrico Santus.
This research is supported by Facebook AI Research, who granted the team the Online Safety Benchmark Award.