A Quiet Revolution in Psychiatry

A Quiet Revolution in Psychiatry
Photo by Alexander Grey / Unsplash
Beyond simply getting the "right" medications, the decision support AI demonstrated a critical advantage over humans: it did not make any harmful choices.

For decades, the clinical management of treatment-resistant depression has been a formidable challenge. While first-line pharmacologic treatments are well-established, there is a distinct lack of evidence regarding the effectiveness of next-step treatments. This has left psychiatrists and general practitioners to navigate a complex landscape with limited guidance. The problem is particularly acute for community clinicians who don't see a high volume of these complex cases, leading to wide variations in practice and potentially suboptimal care. The traditional solution—expert consultation—is often inaccessible, especially in under-resourced areas.

This long-standing problem has seen various attempts at a solution, from computerised algorithms to app-based care. Yet, none of these older methods have become widely used, a gap so significant that the US Preventive Services Task Force noted that screening for depression is only useful when resources for care exist. This is where a new approach, powered by artificial intelligence, is beginning to change the conversation.

A New Kind of Decision Support

A recent study published in the Journal of Mood & Anxiety Disorders explored a novel use of a large language model (LLM) to assist in selecting psychopharmacological treatments. The research, led by Roy H. Perlis and a team of colleagues, aimed to apply and extend a method previously used for bipolar depression treatment.

The study's core method involved generating 20 clinical vignettes that reflected treatment-resistant depression based on data from electronic health records. Two expert psychopharmacologists evaluated each vignette, ranking the five best next-step interventions and identifying contraindicated or poor treatments. This created a "gold standard" to which the AI's performance was compared.

The AI model used was Qwen 2.5:7B-Instruct, a locally-run, open-source LLM. This particular model was chosen for its potential translational advantages, as it can be deployed on a laptop, addressing concerns about data security and dissemination. The model was "augmented" with a synopsis of published treatment guidelines, specifically extracts from the CANMAT 2023 Depression Update. It was then prompted to generate its own ranked list of five best next-step interventions and five poor or contraindicated ones. The results were compelling. The augmented model selected the expert-designated optimal choice for 35.6% of vignettes. While this might seem modest, it compared favourably to the human comparison groups. A sample of community clinicians identified the optimal treatment just 13.2% of the time, and a separate group of expert psychopharmacologists identified it for only 6.4% of vignettes.

A Crucial Safety Net

Beyond simply getting the "right" answer, the AI demonstrated a critical advantage: it was significantly less likely to make a harmful choice. The augmented model did not select any medications considered poor or contraindicated by the experts. In stark contrast, community clinicians identified a poor choice in 33.0% of vignettes, and the second group of experts did so in 10.7% of cases.

This finding is a game-changer. It suggests that even if an AI doesn't always provide the single best answer, it can act as a reliable safety net, steering clinicians away from poor or contraindicated treatment options. This could lead to a decrease in adverse outcomes associated with prescribing, hinting at the potential utility of LLMs for clinical decision support.

The study's findings also address the issue of potential bias. The vignettes were permuted to reflect different gender and race pairings (Black men, Black women, White men, White women). The model's performance was consistent across all these subgroups, suggesting an absence of bias in its recommendations.

Broader Implications

The study highlights a significant opportunity for AI to reduce the variability in care that exists even among expert clinicians. By providing a more standardised, evidence-informed decision-making process, these tools could help overcome the "algorithmic divide" where the quality of care depends on a clinician's personal experience or location.

This approach is framed not as a replacement for clinicians, but as a new tool to support them. One potential application would be to integrate the model with electronic health records, providing a clinician with suggested treatment options and medications to avoid at the point of care. This could be particularly beneficial in settings where access to psychiatric expertise is limited, offering a level of support that was previously unavailable.

The study's finding that the locally-run model performed similarly to a more powerful, cloud-based model (GPT-4) is also promising. This suggests that smaller, more accessible language models could be feasible for clinical use, addressing major concerns about data transmission and confidentiality.

While this is a foundational step, the next phase will involve prospective investigations in real clinical settings. Such studies could compare the AI's silent predictions to actual treatment decisions, ultimately aiming for larger randomized trials to demonstrate the effectiveness of such tools. The findings offer a tangible path forward, demonstrating that AI can act as a powerful force for standardising and improving healthcare, especially in complex fields where human expertise is limited and variable.