Authors: Wilson Lau ⊕Truveta, Inc, Bellevue, WA, Youngwon Kim ⊕Truveta, Inc, Bellevue, WA, Sravanthi Parasa ⊕Swedish Medical Center, Seattle, WA, Md Enamul Haque ⊕Truveta, Inc, Bellevue, WA, Anand Oka ⊕Truveta, Inc, Bellevue, WA, Jay Nanduri ⊕Truveta, Inc, Bellevue, WA
Key points
- In this study, researchers from Truveta applied machine learning and large language models to predict early-onset colorectal cancer (ages 18–44) using Truveta Data.
- A fine-tuned GPT-4o model achieved 73% sensitivity and 91% specificity, outperforming traditional machine learning approaches in balancing detection and false positives.
- The model used conditions, labs, and observations from up to six months prior to diagnosis to identify patients at elevated risk.
- Findings suggest LLM-based approaches may help identify higher-risk patients earlier and support more targeted screening strategies in younger populations.
Abstract
The incidence rate of early-onset colorectal cancer (EoCRC, age < 45) has increased every year, but this population is younger than the recommended age established by national guidelines for cancer screening.
In this paper, researchers from Truveta applied 10 different machine learning models to predict EoCRC, and compared their performance with advanced large language models (LLM), using patient conditions, lab results, and observations within 6 months of patient journey prior to the CRC diagnoses. They retrospectively identified 1,953 CRC patients from multiple health systems across the United States.
The results demonstrated that the fine-tuned LLM achieved an average of 73% sensitivity and 91% specificity.
Read the full study
Predicting Early-Onset Colorectal Cancer with Large Language Models, AMIA Annual Symposium Proceedings

