ChatGPT Health Misses Half of Medical Emergencies, Study Finds

Published: 26 February 2026

What happened: The first independent safety evaluation of OpenAI’s ChatGPT Health feature has found it under-triaged more than half of genuine medical emergencies, recommending home care or routine appointments instead of immediate hospital attendance in 51.6% of urgent cases. The findings were published in the journal Nature Medicine.

Why it matters: Researchers created 60 realistic patient scenarios assessed by three independent doctors, generating nearly 1,000 responses from the platform. In one asthma case, ChatGPT Health advised waiting despite identifying early signs of respiratory failure; in a suffocation simulation, it directed the patient to a future appointment 84% of the time. The platform was also nearly 12 times more likely to downplay symptoms when told by a "friend" in the scenario that there was nothing to worry about.

Wider context: ChatGPT Health, launched in January 2026, allows users to connect medical records and wellness apps for personalised health advice. More than 40 million people reportedly ask ChatGPT health-related questions every day. The researchers also found that adding routine lab results to a suicidal patient’s profile caused a crisis intervention banner to disappear entirely across all 16 test attempts — a failure the lead author described as “arguably more dangerous than having no guardrail at all.”

Background: OpenAI said the study did not reflect typical real-world usage and that the model is continuously updated. Independent experts, however, called for urgent safety standards and independent auditing. Legal liability was also flagged as a concern, with several lawsuits already in motion against tech companies over AI-related self-harm incidents.


Singularity Soup Take: A tool used by tens of millions for health decisions that has a coin-flip chance of recognising a life-threatening emergency isn’t a beta feature — it’s a public health risk that demands mandatory independent auditing before deployment at scale.

Key Takeaways:

  • Under-Triage Rate: ChatGPT Health failed to recommend emergency care in 51.6% of cases where doctors agreed it was immediately necessary, according to the Nature Medicine study.
  • Social Influence Flaw: The platform was nearly 12 times more likely to downplay a patient’s symptoms when a third party in the scenario suggested they were not serious — a significant susceptibility to social framing.
  • Broken Crisis Guardrail: A safety banner for suicidal ideation, which appeared 100% of the time for a test patient, vanished completely in all 16 follow-up tests when normal lab results were added to the same scenario.
  • Over-Triage Also a Problem: 64.8% of individuals with no genuine medical need were directed to seek immediate care, raising concerns about unnecessary strain on health services as well as false reassurance.