AI will transform clinical medicine. Physicians can make sure it’s for the better.
Grayson Armstrong MD’15, MPH, has seen the future of AI in patient care, and it’s in Madurai, India.
More than a decade ago, computer scientists from Google partnered with physicians at Aravind Eye Hospital in this large Tamil Nadu city. They wanted to improve diagnosis of diabetic retinopathy, a leading cause of blindness—and entirely preventable, when caught early and treated.
Prioritizing affordability and access—most of the millions of Indians at risk of going blind live far from an ophthalmologist— the team developed an artificial intelligence algorithm, trained on millions of retinal scans, that could identify diabetic retinopathy and rate its severity. Technicians screen patients at dozens of satellite facilities throughout the state, the AI model offers a preliminary diagnosis, and experts back at the main hospital review it and sign off, within an hour.
“Most of the patients can be taken care of locally,” says Armstrong, an ophthalmologist with Massachusetts Eye and Ear who visited Aravind last summer. “They saw over a million patients last year via telemedicine.”
Lack of access to ophthalmologists is hardly a problem unique to rural India. Armstrong, who directs Mass Eye and Ear’s 24/7 eye emergency department, sees patients every day who didn’t, or couldn’t, seek help until it was too late. “Whether you’re … in the middle of nowhere or if you’re in the heart of Boston, there’s still a difficulty in accessing subspecialty care,” Armstrong says. “And people lose vision as a result.”
In Armstrong’s field, the AI revolution happening at Aravind is “above and beyond what anyone in the United States is able to do right now,” he says. That said, most medical specialties, including his own, are putting algorithms and devices into practice almost daily to analyze images and speech, suggest diagnoses and treatment options, triage and monitor hospital patients, and take on burdensome administrative tasks that contribute to physician burnout.
AI, if developed thoughtfully and with the active participation of clinicians, has the potential to make their jobs better while improving patient care. On the flip side, flashy new tools could be implemented without physician input, upending their workflow and making their jobs harder and less satisfying. Their costs could be justified by clinicians’ efficiency gains, so they’re asked to see more patients. From bias to fabricated data to privacy breaches, the risks are many and the concerns are justified.
Yet as health care needs expand and the physician workforce continues to contract, AI may just become a necessity to fill the gap.
“The population is getting older and eye diseases are getting more prevalent … but the number of ophthalmologists has shrunk and will continue to shrink,” Armstrong says, noting that the same is true for most specialties. “AI just basically replicates the knowledge at some level of a skilled professional, but it makes it so easy and accessible, and cheap to do.”
Last summer Mass General Brigham, Mass Eye and Ear’s health system, dipped a toe into the AI diagnostic pond when it began deploying devices similar to Aravind’s at some of its primary care centers. Clinicians screen patients with diabetes as part of their routine appointments, and refer them to a specialist if they find anything problematic. Armstrong says that on average, about 20 percent to 30 percent of patients have screened positive for diabetic retinopathy—and the rest avoided an unnecessary eye appointment.
The machine, which takes digital pictures of the retina and uploads them to an algorithm for analysis, has been “found to be more accurate in some ways than an ophthalmology dilated exam,” Armstrong says; ophthalmic exams can also be uncomfortable for the patient, who might blink or flinch.
The photo “has the data there, and the AI can be run and find the minutiae, the tiny little microscopic pixels that show us there’s a problem that we as an ophthalmologist might have missed,” he adds. “And it can pick up patterns that we as ophthalmologists can’t do ourselves.”
Many proponents tout an indefatigable machine’s inherent superiority to the human eye. Andy Beck ’02 MMSc’06 MD’06, PhD, is the cofounder of PathAI, which digitizes pathological slides and runs an algorithm that analyzes and prioritizes patient cases. He points out that on a typical day a pathologist looks at hundreds of slides, each of which contains about 100,000 cells. “Humans are very, very good at many, many things, like judgments and pattern recognition at a high level. But what we’re not as good at is examining 100,000 things on a single slide,” Beck says. “So the magnitude of the problem is huge.”
Harvard gastroenterologist Tyler Berzin ’99 MD’03, MS, who helped develop EndoScreener, an AI-assisted colon polyp detector, cites similarly daunting figures: up to 20 procedures per day per endoscopist; an endoscopic video stream generating around 27,000 high-definition images per procedure; “innumerable pixels” per image. “That’s a tremendous amount of data for the human eye to scan. It’s too much for the brain to process,” Berzin says. “It really is not possible for physicians to catch every single finding in every single pixel every second,” while “AI can work tirelessly on mountains of data.”
The FDA has approved about half a dozen algorithms, including EndoScreener, for colon polyp detection, and randomized controlled trials show they improve detection rates by up to 30 percent—“the largest leap in precancerous polyp detection of any technology applied to colonoscopy in the last 20 years,” Berzin says.
In other areas of medicine, however, humans still have an edge. A predictive AI sepsis detection algorithm developed by a large health care software company was overhauled in 2022 after studies found it missed most cases—a dramatic example of the life-or-death implications of this new health care frontier.
“We had no compelling reason to think this is any better than more traditional kinds of [sepsis]screenings,” says Jared Anderson, MD, the director of informatics and of quality and patient safety for Brown Emergency Medicine, which tried out the earlier version of the tool but never used it to direct patient care. (Users of the revamped algorithm are advised to train it on local populations—a time- and staff-intensive process.) Anderson says they ultimately discontinued it partly because, when trying to interpret the algorithm’s decisions, “it didn’t make a lot of sense.”
This is AI’s black box problem: users’ inability to understand how a machine learning model comes to its conclusions. “It’s very hard for people to just blindly trust something that offers you a prediction with really no understanding of how it came to that prediction,” Anderson says.
Drugs and devices often fail when they move from bench to bedside; real-world validation and user feedback are no less critical for medical AI tools. In the Mass Eye and Ear ED, Armstrong and colleagues developed an algorithm to predict whether a patient will get their vision back after a major eye trauma, and maybe “give them some hope,” Armstrong says. They’re also planning to test a generative AI program to triage patients in the ED, to determine if they truly need care now, or could see a doctor later. If it’s deemed safe, patients could use it before going to the hospital, potentially saving them a trip.
“I’m personally a little skeptical that this [AI] can accurately keep a patient home without them going blind,” Armstrong says. “It’s one thing to test these things in a clinical trial and perfect research conditions, and then it’s very different when you employ it with patients that may not understand what the question means, or may not speak English.”
When Brown’s director of cardiovascular research, Gaurav Choudhary, MBBS, teamed up with digital stethoscope maker Eko to develop a machine learning algorithm to detect pulmonary hypertension and grade its severity, he ran a small feasibility study at the Providence VA Medical Center, where he’s the associate chief of staff for research. After getting promising preliminary data and an NIH grant, the team is now enrolling up to 2,000 participants at Rhode Island Hospital—a far more diverse setting than the VA, where “almost 100 percent of them were men,” Choudhary says. They hope to add more sites to the study, for geographical diversity.
There’s one challenge that, due to feasibility issues, the team can’t avoid: a potential for bias because they are enrolling participants who have already been referred for echocardiograms, meaning heart disease is already suspected. Ultimately, though, Eko and the algorithm will be used in an outpatient setting by PCPs to detect PH and refer people to specialists.
“The generalizability of an algorithm is, all in all, obviously of concern … and I suspect this will be a continuous learning thing,” says Choudhary, who is also the Ruth and Paul Levinger Professor of Cardiology. Even after the FDA approves an AI algorithm, “as more data is added, you may have to continue to refine it”—something that’s harder to do with approved medical devices and drugs.
Beck says that’s exactly what PathAI does. “When products are released to market, it’s really important to have—which we have—monitoring built in, and then incorporating feedback from our users so that we can really keep track of when it’s working well and, even more importantly, identify very quickly when the system isn’t performing,” he says. Though PathAI trained its algorithm on diverse datasets, the company works with about 500 pathologists around the world who help to identify performance gaps that “you may not even know about until something is widely deployed.” Berzin adds that while “the training libraries for polyp detection are constantly getting expanded,” most of the data are proprietary. So although colon polyps probably look the same in all patients, he says it’s “an open question how representative” of the patient population the data are.
Ainsley MacLean ’01 MD’05, the chief medical information officer and chief AI officer for the Mid-Atlantic Permanente Medical Group, says when her team began looking into AI mammography platforms, they went with a London-based vendor that developed its product using diverse sets of patient data. “It is important that AI be applied equitably across an entire ecosystem and constantly evaluated for potential bias,” she says.
“If I could have my mammogram read by a trained, highly skilled radiologist like we do now, and an AI … why would I not want that?”
MacLean, a radiologist, is the principal investigator on a trial of the technology, which assists physicians by flagging potentially cancerous spots on the x-rays. Though it hasn’t been cleared by the FDA, European studies “showed increased detection of … breast cancers that radiologists can’t even see yet,” she says. “If I could have my mammogram read by a trained, highly skilled radiologist like we do now, and an AI … why would I not want that?”
For all of her enthusiasm for AI’s potential, and despite the hundreds of AI-powered radiology tools that the FDA has approved, MacLean has not yet implemented any of them in her medical group, outside of pilots or research. “You don’t want to interrupt the radiologist’s workflow,” she says. “A lot of these AIs will just create a bunch of circles on a screen that are distracting” and, in the tools she’s tried so far, “not accurate.” But the technology is improving quickly, she adds; “I suspect within two years we’ll be using more AI in those settings.”
The radiologists at Kaiser Pemanente E have embraced a different kind of AI tool, one that’s been “truly transformative,” MacLean says. Three years ago, they worked with a company to develop an individualized, generative AI reporting platform to help physicians summarize their findings in the impression section and make recommendations. “The first time I used it, I was blown away,” MacLean recalls. The algorithm, which was trained on thousands of her past reports, “generated an impression that sounded exactly like me,” and even included a differential diagnosis that she hadn’t thought of.
It got better: “Normally, when I work an overnight, I will just kind of collapse in bed afterwards,” due to the cognitive fatigue, MacLean says. “The first time I used this, I literally went out and picked up my kids. I was able to drive them around.” Later, when they rolled out an AI scribe, a physician told her that on his first day using it, he went to the gym after work—something he hadn’t done since college.
“Our doctors are saving close to an hour a day,” MacLean says. “The technology is not cheap. But we made [the investment]because you know what’s also expensive? Not having doctors, and having burnt-out doctors.”
Return on investment is a critical factor for anyone considering a novel technology. Paul Larson, MD, MBA, chief of primary care for Brown University Health, who helped lead their pilot of an AI scribe called DAX Copilot, says some health systems justify the cost with increased productivity—clinicians have more time to see more patients—but for now, his team is focused on improving physician satisfaction and reducing burnout.
“The idea of going home and spending an hour, an hour and a half on your notes is just completely unsustainable,” says Larson, a clinical assistant professor of family medicine. “Clinicians just are very overwhelmed by the documentation.”
The AI scribe, which listens to each patient encounter through a clinician’s phone and then populates the EHR, doesn’t just save valuable time. This futuristic addition to the exam room is allowing doctors to talk to patients face to face, without distraction—to practice medicine the way they’d envisioned back when they were in school. “Once you’ve cued it, you just ignore it completely. And you have a completely normal interaction with the patient,” Larson says.
Like MacLean, Larson has been astonished, even humbled, by the quality of the notes the AI scribe produces: it rearranges nonlinear conversations into topical paragraphs; it excludes irrelevant chitchat; it organizes his recommendations and diagnoses in numbered, thorough paragraphs. “The computer writes better notes than I ever did,” Larson says.
And that quality has improved over time, with software updates in response to user feedback. As with human scribes, clinicians must review the AI scribe’s notes and then sign them. At first, Larson sometimes made “ever-so-slight edits,” like correcting a patient’s pronouns or their relationship to someone else in the room; “now it’s rare to none that I ever edit a note.” Instead of taking up to 90 minutes to close his notes at the end of the day, he’s done “within about five or six minutes.”
The word has gotten out: a few dozen clinicians in various specialties at Brown Health are using DAX Copilot, and there’s a waiting list. Meanwhile Brown Emergency Medicine is piloting the scribe, Anderson says, and while so far it’s performing well with patients who are “awake, alert, and oriented,” he wants to see how it performs in more challenging situations—“you know, the patient who’s getting evaluated for a stroke or intoxication or confusion,” perhaps while someone elsewhere in the ED is yelling in the background. “I’d be very curious to push it to its limit,” he adds, but he’ll have to wait until this summer to do that: the few licenses available now are for iPhones, and he has an Android.
AI can help clinicians in other ways, like pending orders for prescriptions and screenings, and attaching diagnostic codes—both recent additions to the software that Brown Health is using. “There’s a long list of other functions that are coming down the pike,” Larson adds. But “the Holy Grail would be the inbox”: the relentless stream of patient portal messages that require so much time and effort. “The inbox is probably driving burnout for clinicians, and dissatisfaction with the [electronic health]record, faster than is documenting an in-person encounter,” he says.
There is so much more that AI can do for doctors and patients, so long as there’s money to invest in the technology, time to train clinicians to use it, and, most of all, buy-in: the device or algorithm has to solve an existing problem in a meaningful way, so that physicians want it (or patients demand it). For example, Beck says only about 10 percent of pathology labs worldwide have gone digital, a percentage he thinks will grow rapidly as regulators approve more AI-powered diagnostics. But it will only grow if companies like his listen to physicians as they develop their products.
“Why has the microscope persisted for 150 years? It’s obviously solving a lot of problems extremely well. So it’s this humility to learn about what’s working today, and what really frustrates people today, and then being laser focused on that, far more than the technology,” Beck says.
So what do doctors want? Anderson (speaking on his own behalf, not his employers’) says the ability of AI to ingest and summarize massive amounts of text in seconds would be a huge help in the ED. For example, if a patient with a complicated medical history comes in complaining of dizziness, Anderson could much more easily narrow down dozens of theoretical causes by knowing, for example, the patient had frequent bacterial bloodstream infections with similar symptoms.
“That would greatly decrease my cognitive burden of needing to either sort through the chart for half an hour to try to figure out what the heck is going on, or … you spend two or three minutes, make a good guess, and maybe you’re not able to make the optimal medical decision,” he says. “I think [this]happens to myself and my colleagues frequently.”
AI information retrieval would have been a game-changer for Laura Mercurio MD’14 RES’17 F’20, assistant professor of pediatrics and of emergency medicine, during an overnight in Hasbro Children’s Hospital’s ED last year, when a young boy with congenital heart disease came in with a stroke. “It’s rare on rare,” she recalls. “What I had to do is find out very quickly, what are the types of things I can do to stabilize his bleeding without clotting off the stents that he had in his heart?”
So Mercurio found herself, at 3:41 a.m., on three simultaneous phone calls with experts in Providence and Boston, trying to save the child’s life—which they did. And while it would have been a stressful situation no matter what, “wouldn’t it have been nice if I just had one way to get [the information]?” she says. “AI, in rare and unusual diagnoses, can be incredibly helpful for us.” She adds, “We have so much information, it’s now become well beyond the human scope to process in real time.”
Jay Gopal ’25 MD’29, who works on MacLean’s AI team at Kaiser Permanente, wants to harness all that data to prevent disease before it starts. He started working with medical AI models eight years ago, in the ninth grade, when he created an awardwinning early-diagnosis tool for glaucoma. He’s also collaborated with a cardiologist at Brown to develop an algorithm to identify heart failure risk, and a group at Stanford to use computer vision to monitor hospital patients.
“I see AI as a transformative force that will advance proactive medicine, helping us intervene earlier and improve patient outcomes,” Gopal says. But it’s equally important to understand how an algorithm arrives at a diagnosis—the black box issue that sunk the sepsis detection tool—so that clinicians take proper action. “We’re refining the explainability of AI models so physicians can trust the results, reduce diagnostic errors, plan targeted interventions, and deliver better care,” he says.
“AI, in rare and unusual diagnoses, can be incredibly helpful for us.” She adds, “We have so much information, it’s now become well beyond the human scope to process in real time.”
Armstrong stresses that no digital solution, no matter how miraculous it sounds, can help patients if their “real-world struggles” remain unaddressed. A PCP can find digital retinopathy with an AI camera, but they can’t guarantee the patient will go to the ophthalmologist and get the care they need. “We always assume that these tools are going to help improve health disparities,” Armstrong says. But in his studies of tele-ophthalmology use during pandemic lockdowns, he found that the patients most likely to use those services at Mass Eye and Ear “were actually the richer, white, English-speaking patients, and all these other patients were being left behind,” he says.
In November JAMA Network Open published a small study in which ChatGPT correctly diagnosed more clinical cases than physicians alone, or even physicians using the chatbot. Similarly, Berzin says, AI may outperform both humans and “the human-AI hybrid” in polyp detection and diagnosis. “It makes people a little nervous to come to grips with that,” he says, but in some areas of prediction and diagnosis, “it is pretty likely that AI tools are just going to be better than doctors.”
While Anderson wasn’t surprised by the study findings, he adds, “ChatGPT is not going to be able to walk into a room with a patient that is drunk and half asleep and has to be shaken awake to provide history, and then come up with diagnoses as well as a human.”
Gopal, whom MacLean calls “the future of AI,” says when he worked with her to refine clinical guidelines for AI at Kaiser Permanente, they emphasized that “the provider always has the final say on clinical management and diagnosis.” While Gopal wants clinicians to welcome new technology, it’s an “additional data point,” he says: “A physician’s clinical judgment is the crux of medicine. AI can never replace that.” Berzin, who has written about the legal issues of physicians using AI and who (or what) is ultimately liable for a diagnosis, adds, “There are going to be very strong guardrails around ensuring that physicians have some primary responsibility for at least some level of review of the data, even if AI has done a lot of the heavy lifting.”
Throughout the history of medicine, doctors have poohpoohed, and then embraced, new technology. Choudhary says cardiologists have long lamented the loss of auscultation skills with the advent of imaging tools. “It’s a dying art, and I think it’s sad,” he admits. But like the AI diabetic retinopathy detector in use in India, the AI stethoscope would make a big difference in low-resource settings where echocardiograms are hard to come by. “There’s a lot of pulmonary hypertension associated with schistosomiasis around the world,” Choudhary says. “So the opportunity to have an algorithm like this would be awesome.” Meanwhile, AI frees up physicians to spend more time with their patients, manage complex systems, and make decisions.
AI isn’t coming for physicians’ jobs, but as many advocates have said, physicians who use AI will likely replace those who don’t. And those who help develop it may fare best of all, because they can ensure the new tools address their problems, help their patients, and are as safe, accurate, and reliable as they can be. The risks are many, as even the biggest boosters acknowledge. But dooming the whole enterprise out of fear of what could go wrong only underscores the critical need for physicians to take an active role. “It’s going to be here whether we like it or not,” Choudhary says. “If we just leave it to engineers and others, there’s a risk that it will not deliver what we want it to deliver.”
MacLean adds: “It’s healthy to be skeptical of any technology. But then it’s equally healthy to be nimble and innovative and forward-thinking. It’s going to require you at the table to chart the future of medicine. But I think the future of medicine is very, very bright, in part because of AI.”