When Alex Stamos describes the challenge of studying the worst problems of mass-scale bad behavior on the internet, he compares it to astronomy. To chart the cosmos, astronomers don’t build their own Hubble telescopes or Arecibo observatories. They concentrate their resources in a few well-situated places, and share time on expensive hardware. But when it comes to tackling internet abuse ranging from extremism to disinformation to child exploitation, Stamos argues, Silicon Valley companies and academics are still trying to build their own telescopes. But what if they shared their tools—and more importantly, the massive datasets they’ve assembled?
That’s the idea behind the Stanford Internet Observatory, founded with a $5 million donation from Craigslist creator Craig Newmark, and part of the Stanford Cyber Policy Center where Stamos is a visiting professor. It aspires to be a central outlet for the study of all manner of internet abuse, assembling for visiting researchers the necessary machine learning tools, big data analysts, and perhaps most importantly, access to major tech platforms’ user data—a key to the project that may hinge on which tech firms cooperate, and to what degree.
“People have to stand up and be patriots. That means platforms, and researchers and funders.”
Craig Newmark, Philanthropist
As an example of a potential undertaking, Stamos points to political disinformation of the kind that roiled the 2016 presidential election during his time at Facebook, a problem that has become the most glaring example of Silicon Valley’s blindspots around abuse of their services. “Misinformation is not just a computer science problem. It’s a problem that brings in political science, sociology, psychology,” Stamos says. “Part of the idea of the Internet Observatory is to build a place for these people to work together, and we want to build the infrastructure necessary to allow all the different parts of the political and social sciences to study what’s happening online.”
The observatory is currently negotiating with tech firms—Stamos names Facebook, Google, Twitter, YouTube, and Reddit as examples—that it hopes will offer access to user data via API in real-time and in historical archives. The observatory will then share that access with social scientists who might have a specific research project but lack the connections or resources to grapple with the immensity of the data involved. Stamos hopes that his data clearinghouse might lower the technical barriers social scientists face now when they try to study users on the internet at scale.
“They have to have a grad student write Python, they have to spend months negotiating data access agreement with tech companies, they have to build a bunch of data science infrastructure,” Stamos says. “We’re trying to do that work once, and offer it to all these people.”
First, Get the Data
But negotiating that access to data may not be an easy sell, even for someone with as many Silicon Valley connections as Stamos. Facebook has been wary of any data-sharing agreements with academics since its disastrous Cambridge Analytica scandal, a privacy debacle—which happened under Stamos’ watch—for which the FTC announced a $5 billion fine against the company just yesterday. The European Union’s General Data Protection Regulation also limits what sort of data tech firms can share about European users. When WIRED reached out to Twitter, Google, Facebook, and Reddit about the observatory’s plan, Twitter and Reddit declined to comment, though a Reddit spokesperson said the company hadn’t yet been approached to share its data. Facebook and Google didn’t respond.
As a model for how those data sharing deals can actually be struck, though, Stamos points to another project known as Social Science One. When it was created in April of 2018, that initiative hammered out a deal with Facebook to access some of its user data as part of its efforts to combat disinformation intended specifically to influence democratic elections. That data-sharing arrangement uses a form of so-called differential privacy, a still-developing class of tools that allow data to be queried in aggregate while limiting the details included in responses. It means that no uniquely identifying information is ever shared about individuals.
“This really rose out the ashes of the Cambridge Analytica scandal,” says Nate Persily, who founded Social Science One and also serves as co-director of the Stanford Cyber Policy Center that will house the Internet Observatory. “This is an attempt to figure out a safe, secure, privacy-protective way to make this data available to academics.”
While Stamos’ observatory hopes to access some companies’ data through Social Science One and through the observatory’s own direct negotiations, in other cases it plans to take a more direct approach: Simply scraping up public data without asking permission. After all, Stamos points out, much of the internet’s extremist and abusive behavior lives on sites like 4chan, 8chan, Voat and Gab, not the mainstream sites that might partner with his project. While scraping those sites might seem intrusive, Stamos points out that users of these sites are generally anonymous by default and public in their postings.
“Right now you cannot study what led to the Christchurch shooting, because that data has been intentionally pulled off and purged to cover tracks,” Stamos says, referring to the shooting of 51 people in a New Zealand mosque, an act whose perpetrator posted a manifesto to the fringe social media site 8chan. Privacy issues around those sites, says Stamos, are “something we’re well aware of, and we’re trying to be careful about in our use of this. But in the end if you want to understand these problems you can’t do so without understanding the darkest corners of the internet.”
From Security to ‘Abusability’ Education
Stamos’ Internet Observatory idea came into being when he met Craig Newmark at an Aspen Cybersecurity Summit reception last summer. Newmark, who has given more than $100 million to projects focused on what he describes as “information warfare,” says he was impressed with the approach Stamos described. “This is real World War II, greatest generation stuff. The need is dire, the emergency is real,” Newmark says. “People have to stand up and be patriots. That means platforms, and researchers and funders. Alex and people like him are on the front lines.”
More broadly, Stamos says his goal with the observatory—and his plans for a Stanford undergraduate education program linked to it—is to push for more systematic thinking about abuse across tech firms, a shift he describes as similar to the cybersecurity evolution the tech industry underwent 20 years ago. Back then, when Stamos was beginning his career at the legendary cybersecurity consultancy @stake, companies were just waking up to the insecurity of their code, and learning to cooperate with the academic researchers and white hat hackers poking holes in their products.
“We’re now in that same place with bigger trust, safety and privacy issues, in that our industry doesn’t know how to build software that can be trusted by users to operate in their best interest and protect them from all these kinds of abuse,” Stamos says. He argues a new generation of engineers needs to learn to think just as systematically about abusability as they do about security—how their tools can have unexpected and dangerous effects in the real world. “If you just have the skillset you normally get from a computer science education, you’re complete unprepared for the kinds of abuse that will happen on your product.”
Through a combination of education, research, and lobbying, Stamos hopes his observatory can nudge the tech industry towards taking those abusability problems seriously—and show them real solutions based on deep analysis across the entire internet’s data. “If we want companies to make smart decisions, we have to create the intellectual framework for them to base it on, and we have to lobby them as well,” Stamos says. “The unfortunate truth is that the most important decisions in balancing privacy and safety online are not being made in DC, Brussels, or Paris. They’re being made in Silicon Valley.”