Doctors have been understandably skeptical of claims that artificial intelligence will transform medicine. Recall the misleading claims that radiologists might soon be obsolete, years of annoying automated pop ups in electronic health records, and the deployment of IBM’s near-useless Watson. But the development of new large-language models may actually live up to the hype. GPT-4, the latest, largest, and most capable such model developed by OpenAI, aces many AP exams and passes various professional certification exams—even tests for sommeliers—without having been trained for any of them.
In medicine, on a series of exams that aspiring physicians must pass to obtain a medical license, GPT-4 comfortably passes with an 83 percent correct score. (A minimum passing score is about equal to 60 percent.) It also achieves impressive results on a board resource designed to prepare physicians for the American Board of Internal Medicine exam. Compared with physician answers to 195 patient questions, GPT-4’s answers were rated more highly on empathy by a team of blinded health-care professionals, though accuracy wasn’t assessed.
No state has yet granted GPT-4 a license to practice medicine independently, but the tool seems poised to change the practice of medicine in many ways. Given this particular model’s tendency to “hallucinate” incorrect information, however, as well as its out-of-date knowledge and the caution issued by its maker, a human doctor would probably have to supervise its use: call it a “doctor-in-the-loop.” Still, GPT-4 is poised to streamline medicine. Providing the tool with sufficient context, explicit and detailed instructions (or “prompts”), and, eventually, some degree of Internet access will likely improve its performance relative to the already-impressive baseline.
Consider first some hidden uses. GPT-4 could likely speed administrative workflows by automating the completion of pre-authorization forms for medications and appeal letters to insurance companies that deny care. For health-care systems, it will enable more insight into opaque medical records: as health-tech entrepreneur Will Manidis puts it, AI models will make “data computable” by translating messy patient records into more usable formats.
For physicians overwhelmed with patient messages, automated response drafting, along with helpful summaries, will be a welcome addition. A pilot project is already underway at a number of health-care institutions. A hypothetical “digital scribe,” meantime, would combine speech-transcription technology with GPT-4 to listen to patient-physician visits and automate note generation; the notes could then be automatically examined for possible billing codes, yielding optimal revenue generation—a boon for physicians, though not one likely to be welcomed by payers like insurance companies and Medicare.
For patients who struggle to understand complicated terminology or lack English proficiency, the ability to translate medical notes into digestible formats in a variety of languages will be helpful. In time, patients and doctors might be able to ask their medical records questions and get back contextualized answers, as the financial-technology company Stripe is doing with its developer documentation.
Some uncertainties remain. What effect will reducing some health-care transaction costs have on overall system costs? As policy analyst Samuel Hammond writes in a recent blog post, “Forecasting the near-term impact of AI thus requires a theory of which transaction costs will fall and what other transaction costs will rise.” One mechanism to constrain costs in health care has been “utilization management”: protecting the supply of expensive medications, surgeries, and tests by hiding them behind byzantine insurance-company paperwork, especially for Medicare Advantage insurance plans. Seen this way, high transaction costs can be a plus from a system perpsective: patients who truly need expensive treatments can eventually get them, but cheaper alternatives will be trialed first, keeping costs down overall. But in practice, this can be an immensely frustrating ordeal. Occasional horror stories emerge of much-needed treatment getting denied, resulting in delays.
What happens when the cost of writing perfectly formatted prior authorizations and appealing any subsequent insurance denials falls to almost zero? The first-order effects are likely to be cost-saving. Staff time dedicated to insurance-mandated paperwork will decrease substantially, freeing up time for other work and improving patient experience. But guessing at the second-order effects requires some context. About 94 percent of prior authorizations submitted are approved. Of those that get denied, only 11 percent are subsequently appealed, and of those, about 80 percent result in partial or full authorization. The net impact will probably be a modest rise in the share of prior authorizations approved on the initial go-around, and a large increase in the number that are subsequently appealed and authorized. Insurance companies may use systems like GPT-4 to audit prior authorizations, too, instead of the crude automated systems that some use now.
AI’s capabilities will only improve. GPT-4 performs well even before any task-specific training or real-time access to extant medical databases or various professional guidelines. As new business models and regulation develop with GPT-4 and its successors in mind, much larger changes in health care are possible.
Photo by Jakub Porzycki/NurPhoto via Getty Images