3 Minute Read
JMIR Mental Health study shows that GPT-4 predicts mental health crises with similar accuracy, but higher sensitivity and lower specificity, than trained clinicians
SAN FRANCISCO, CA – August 5, 2024 – Telemental health company Brightside Health today announced results of its peer-reviewed study evaluating the performance of a large language model (LLM) in predicting current and future mental health crisis episodes. The research, published in JMIR Mental Health, showed that OpenAI’s GPT-4 was able to identify and predict a mental health crisis (endorsement of suicidal ideation with plan) with similar accuracy, but higher sensitivity and lower specificity, than trained clinicians.
“In line with our commitment to utilize AI in a safe and controlled manner, this research highlights the potential of large language models for triage and clinical decision support in mental health,” said Dr. Mimi Winsberg, Co-Founder and Chief Medical Officer of Brightside Health. “While clinical oversight remains paramount, technologies such as these can help alleviate provider time shortage and empower providers with risk assessment tools, which is especially crucial for patients at risk of suicide.”
While use cases for AI in healthcare have grown, there has been limited research on using AI for the purpose of mental health crisis prediction. This peer-reviewed research analyzed data from 460 patients on the Brightside Health telehealth platform, including 260 patients who reported suicidal ideation with a plan to act on it, and 200 patients who did not endorse suicidal ideation. Six clinicians and GPT-4 were asked to predict the emergence of suicidal ideation with plan based only on the patient’s chief complaint, in free text, without access to any other patient information. Key results from the study include:
- Similar accuracy of LLMs compared to clinicians: The data indicated that overall accuracy – i.e. correctly assigning suicidal ideation with plan vs. no suicidal ideation in the 460 examples – across the six clinicians using chief complaint alone ranged from 55.2% to 67% accuracy, with an average of 62.1%. The GPT-4 based model had 61.5% accuracy.
- Time savings of LLMs: Data showed the average clinician took over three hours to evaluate the 460 samples of text provided, while GPT-4 completed the full evaluation in less than 10 minutes.
This research suggests tools such as GPT-4 hold promise for aiding clinicians in delivering mental health care, including timely care to higher acuity and severity patients. It also adds to Brightside Health’s growing body of research, including a peer-reviewed study published in JMIR Formative Research that demonstrated the reduction of suicidal ideation with treatment on a telehealth platform.
To access the full research, visit https://mental.jmir.org/2024/1/e58129. For more information on Brightside Health, visit www.brightside.com.
Get in Touch