Atropos Health’s Post

When it comes to utilizing large language models (LLMs) in healthcare, few are evaluated in scenarios that align with clinical practice. That’s where Atropos Health co-founder Nigam Shah, Chief Data Scientist at Stanford Health Care, and Stanford University Professor, and the MedHELM tool come into play. In a recent article for STAT, Dr. Shah discusses the development of the tool, designed to evaluate the performance of LLMs within a healthcare context. Modeled after the general Holistic Evaluation of Language Models (HELM) that Stanford developed to evaluate general AI uses, MedHELM evaluates LLMs on clinical tasks evaluated by practicing clinicians, categorized into five broad areas: clinical decision support, clinical note generation, patient communication and education, medical research assistance, and administration and workflow. Beginning with testing six different LLMs on 121 individual tasks, Dr. Shah and the team aim to expand the tool to incorporate additional datasets, tasks, and models. The tool is available for anyone who wants to run it on their own datasets or add to the collective evaluation of models. Learn more about Dr. Shah’s leadership in driving responsible development and implementation of artificial intelligence (AI) for healthcare: https://hubs.li/Q03dS-Xm0 #AtroposHealth #HealthcareInnovation #RealWorldData #RWD #HealthcareTechnology #AIinHealthcare #HealthTech #HealthcareIndustry #DataScience #LLMs #LargeLanguageModels #AImodels #GenerativeAI #GenAI

To view or add a comment, sign in

Explore topics