The New York Times covered Microsoft’s Copilot Health launch last week. It’s worth reading because it reveals where the entire consumer health AI category stands right now.
First, let me say what I genuinely believe: Consumer health AI is one of the most important developments in modern medicine. Not because it is technically impressive—though it is—but because of what it could mean for the patient who can’t afford or access a specialist. The single mother navigating a diagnosis at midnight. The farmworker with no primary care physician within 50 miles. The patient whose skin color or zip code has historically predicted access to quality health information as reliably as clinical need.
To be clear: I do not believe AI should replace physicians, and I do not want it to. For patients who have ready access to a trusted clinician, that relationship remains the gold standard—full stop. But the uncomfortable truth is that for a vast and growing number of patients, that access simply does not exist. In major US metro areas, the average wait for a new patient physician appointment is now about 31 days, with average waits of 32.7 days for cardiology, 36.5 days for dermatology, 40 days for gastroenterology, and 41.8 days for OB-GYN. For those patients, AI is not competing with a doctor. It is filling a void where no doctor was available in the first place.
For those patients, a well-built health AI platform is not a convenience. It is a lifeline.
That is exactly why we have to get this right.
The problem is not simply that AI can be wrong. It is that today’s AI-health guidance is often not based on outcomes.
The industry is racing toward what it is calling “Medical Superintelligence.” But intelligence without evidence is not yet trustworthy healthcare guidance.
As the Times notes, a February study in Nature Medicine found that while leading models performed well when tested alone, members of the public using those same tools were no better than a control group at identifying the right condition or deciding what to do next. The study identified user interaction as a major challenge in real-world deployment. In one widely cited case, a 60-year-old man was hospitalized with bromide toxicity after following AI-generated guidance involving sodium bromide.
These are not just examples of model imperfection. They expose the central gap: These systems are generating health guidance without being reliably anchored to what actually happened to similar patients.
If a patient is truly similar to the population studied in a rigorous randomized control trial (RCT), guidance can be grounded in that evidence. But in much of real-world medicine, that is the exception, not the rule. RCTs often underrepresent or exclude the very patients who will receive therapies in routine care: older adults, pregnant people, patients with multiple comorbidities, and many historically underrepresented populations.
That is why published literature alone is not enough. In many cases, health AI needs outcomes from real-world data. It needs to know: In patients like this, under circumstances like this, what actually happened?
The stakes are not symmetrical. When a health AI product gets it wrong, it will not get it wrong equally. It will get it wrong most often for the patients who have the fewest alternatives—the ones who needed it most.
Health AI companies building in this space should want outcome-grounded evidence underneath their products. In the next phase of this market, evidentiary grounding will become a product differentiator, a trust signal, and increasingly, a liability shield. The promise of democratizing health intelligence is only as credible as the evidence it rests on.
The most durable health AI platforms will not simply be the most engaging. They will be the most evidence-informed, grounded in real patient journeys, real treatment decisions, and real outcomes across the full diversity of the population they serve.
The opportunity is extraordinary. The responsibility is equal to it. And the outcome-linked data infrastructure to do this right already exists—though much of the health AI market has not yet fully recognized it.

