Robert Chang, a Stanford ophthalmologist, normally stays busy prescribing drops and performing eye surgery. But a few years ago, he decided to jump on a hot new trend in his field: artificial intelligence. Doctors like Chang often rely on eye imaging to track the development of conditions like glaucoma. With enough scans, he reasoned, he might find patterns that could help him better interpret test results.
Gregory Barber covers cryptocurrency, blockchain, and artificial intelligence for WIRED.
That is, if he could get his hands on enough data. Chang embarked on a journey that’s familiar to many medical researchers looking to dabble in machine learning. He started with his own patients, but that wasn’t nearly enough, since training AI algorithms can require thousands or even millions of data points. He filled out grants and appealed to collaborators at other universities. He went to donor registries, where people voluntarily bring their data for researchers to use. But pretty soon he hit a wall. The data he needed was tied up in complicated rules for sharing data. “I was basically begging for data,” Chang says.
Chang thinks he might soon have a workaround to the data problem: patients. He’s working with Dawn Song, a professor at the University of California-Berkeley, to create a secure way for patients to share their data with researchers. It relies on a cloud computing network from Oasis Labs, founded by Song, and is designed so that researchers never see the data, even when it’s used to train AI. To encourage patients to participate, they’ll get paid when their data is used.
That design has implications well beyond healthcare. In California, Governor Gavin Newsom recently proposed a so-called “data dividend” that would transfer wealth from the state’s tech firms to its residents, and US Senator Mark Warner (D-Virginia) has introduced a bill that would require firms to put a price tag on each user’s personal data. The approach rests on a growing belief that the tech industry’s power is rooted in its vast stores of user data. These initiatives would upset that system by declaring that your data is yours, and that companies should pay you to use it, whether it’s your genome or your Facebook ad clicks.
In practice, though, the idea of owning your data quickly starts looking a little … fuzzy. Unlike physical assets like your car or house, your data is shared willy-nilly around the web, merged with other sources and, increasingly, fed through a Russian doll of machine learning models. As the data transmutes form and changes hands, its value becomes anybody’s guess. Plus, the current way data is handled is bound to create conflicting incentives. The priorities I have for valuing my data (say, personal privacy) conflict directly with Facebook’s (fueling ad algorithms).
Song thinks that for data ownership to work, the whole system needs a rethink. Data needs to be controlled by users, but still usable to others. “We can help users to maintain control of their data and at the same time to enable data to be utilized in a privacy preserving way for machine learning models,” she says. Health research, Song says, is a good way to start testing those ideas, in part because people are already often paid to participate in clinical studies.
This month, Song and Chang are starting a trial of the system, which they call Kara, at Stanford. Kara uses a technique known as differential privacy, where the ingredients for training an AI system come together with limited visibility to all parties involved. Patients upload pictures of their medical data—say, an eye scan—and medical researchers like Chang submit the AI systems they need data to train. That’s all stored on Oasis’s blockchain-based platform, which encrypts and anonymizes the data. Because all the computations happen within that black box, the researchers never see the data they’re using. The technique also draws on Song’s prior research to help ensure that the software can’t be reverse-engineered after the fact to extract the data used to train it.