MIT professors David Sontag and Peter Szolovits don’t assign a textbook for their class, 6.S897HST.956 (Machine Learning for Healthcare), because there isn’t one. Instead, students read scientific papers, solve problem sets based on current topics like opioid addiction and infant mortality, and meet the doctors and engineers paving the way for a more data-driven approach to health care. Jointly offered by MIT’s Department of Electrical Engineering and Computer Science (EECS) and the Harvard-MIT program in Health Sciences Technology, the class is one of just a handful offered across the country.
“Because it’s a new field, what we teach will help shape how AI is used to diagnose and treat patients,” says Irene Chen, an EECS graduate student who helped design and teach the course. “We tried to give students the freedom to be creative and explore the many ways machine learning is being applied to health care.”
Two-thirds of the syllabus this spring was new. Students were introduced to the latest machine-learning algorithms for analyzing doctors’ clinical notes, patient medical scans, and electronic health records, among other data. Students also explored the risks of using automated methods to explore large, often messy observational datasets, from confusing correlation with causation to understanding how AI models can make bad decisions based on biased data or faulty assumptions.
With all of the hype around AI, the course had more takers than seats. After 100 students showed up on the first day, students were assigned a quiz to test their knowledge of statistics and other prerequisites. That helped whittle the class down to 70. Michiel Bakker, a graduate student at the MIT Media Lab, made the cut and says the course gave him medical concepts that most engineering courses don’t provide.
“In machine learning, the data are either images or text,” he says. “Here we learned the importance of combining genetic data with medical images with electronic health records. To use machine learning in health care you have to understand the problems, how to combine techniques and anticipate where things could go wrong.”
Most lectures and homework problems focused on real world scenarios, drawing from MIT’s MIMIC critical care database and a subset of the IBM MarketScan Research Databases focused on insurance claims. The course also featured regular guest lectures by Boston-area clinicians. In a reversal of roles, students held office hours for doctors interested in integrating AI into their practice.
“There are so many people in academia working on machine learning, and so many doctors at hospitals in Boston,” says Willie Boag, an EECS graduate student who helped design and teach the course. “There’s so much opportunity in fostering conversation between these groups.”
In health care, as in other fields where AI has made inroads, regulators are discussing what rules should be put in place to protect the public. The U.S. Federal Drug Administration recently released a draft framework for regulating AI products, which students got to review and comment on, in class and in feedback published online in the Federal Register.
Andy Coravos, a former entrepreneur in residence at the FDA, now CEO of ElektraLabs in Boston, helped lead the discussion and was impressed by the quality of the comments. “Many students identified test cases relevant to the current white paper, and used those examples to draft public comments for what to keep, add, and change in future iterations,” she says.
The course culminated in a final project in which teams of students used the MIMIC and IBM datasets to explore a timely question in the field. One team analyzed insurance claims to explore regional variation in screening patients for early-stage kidney disease. Many patients with hypertension and diabetes are never tested for chronic kidney disease, even though both conditions put them at high risk. The students found that they could predict fairly well who would be screened, and that screening rates diverged most between the southern and northeastern United States.
“If this work were to continue, the next step would be to share the results with a doctor and get their perspective,” says team member Matt Groh, a PhD student at the MIT Media Lab. “You need that cross-disciplinary feedback.”
The MIT-IBM Watson AI Lab took the trouble of making the anonymized data available to students on the IBM cloud out of an interest in helping to educate the next generation of scientists and engineers, says Kush Varshney, principal research staff member and manager at IBM Research. “Health care is messy and complex, which is why there are no substitutes for working with real-world data,” he says.
Szolovits agrees. Using synthetic data would have been easier but far less meaningful. “It’s important for students to grapple with the complexities of real data,” he says. “Any researcher developing automated techniques and tools to improve patient care needs to be sensitive to its many nuances.”
In a recent recap on Twitter, Chen gave shout-outs to the students, guest lecturers, professors, and her fellow teaching assistant. She also reflected on the joys of teaching. “Research is rewarding and often fun, but helping someone see your field with fresh eyes is insanely cool.”