Truveta brand logo mark in teal on a black background, featuring stacked chevron shapes forming the Truveta symbol.

ASCO 2026: Multi-agent LLM framework predicts one-year risk across multiple cancers

by | May 29, 2026

Authors: Sihang Zeng ⊕,Truveta, Inc, Bellevue, WA, Youngwon Kim, PhD ⊕, Truveta, Inc, Bellevue, WA, Wilson Lau, PhD ⊕, Truveta, Inc, Bellevue, WA, Ruth Etzioni, PhD ⊕, Fred Hutch Cancer Center, Seattle, WA, Meliha Yetisgen, PhD ⊕, University of Washington Seattle, WA, Anand Oka, PhD ⊕, Truveta, Inc, Bellevue, WA, Jay Nanduri, MBA, MS Truveta, Inc, Bellevue, WA

multi-agent LLM framework predicts one-year risk across multiple cancers
  • TrajOnco, a multi-agent large language model (LLM) framework, estimated one-year risk across 15 cancer types directly from longitudinal electronic health records.
  • The system achieved clinically meaningful discrimination across cancers, with AUROCs ranging from 0.64 to 0.80. Liver cancer had the strongest performance, followed closely by lung cancer.
  • In the lung cancer benchmark cohort, TrajOnco reached an AUROC of 0.79, comparable to trained machine learning models while requiring no cancer-specific model training.
  • Beyond prediction, TrajOnco generated patient summaries and interpretable rationales that can support real-world evidence generation and exploratory knowledge discovery.

This report builds on our ASCO oral presentation, Predicting multi-cancer risk from EHR data using multi-agent LLMs, as part of the TrajOnco project.

Accurate cancer risk prediction is essential for identifying patients who may benefit from earlier screening, diagnostic follow-up, or closer monitoring. Yet predicting risk across multiple cancers remains challenging. Electronic health records contain rich longitudinal information, but those data are heterogeneous, noisy, and often spread across years of conditions, labs, medications, procedures, and observations.

Traditional machine learning approaches can perform well, but they usually require disease-specific training, feature engineering, and substantial preprocessing. This limits scalability when the goal is not one cancer model, but a general framework that can reason across many cancer types.

TrajOnco addresses this challenge using a multi-agent LLM system with episodic memory. Rather than training a separate model for each cancer, TrajOnco processes a patient’s longitudinal electronic health record (EHR) history, synthesizes the clinical trajectory, and generates cancer-specific one-year risk scores with interpretable rationales. The accompanying TrajOnco paper describes the framework as a training-free, multi-agent LLM approach for scalable multi-cancer early detection using longitudinal EHR data.

Methods

Using Truveta Data, we identified cases for 15 cancer types with clinically recognized precursor signs using clinician-curated diagnostic codes: lung, ovarian, liver, pancreatic, colorectal, multiple myeloma, lymphoma, gastric, bladder, leukemia, prostate, esophageal, endometrial, breast, and cervical cancer.

For each cancer type, 500 cases were randomly sampled and matched 1:1 with controls by 10-year age group and sex. All structured EHR data prior to time of prediction (one year before diagnosis or the matched index date for controls), were included. The input data included conditions, laboratory results, observations, medications, and procedures.

TrajOnco used a chained multi-agent architecture. Worker agents processed sequential portions of each patient’s longitudinal record and stored salient clinical evidence in episodic memory. A manager agent then synthesized the trajectory into a concise patient summary, cancer-specific one-year risk scores on a 1–10 scale, and an interpretable rationale.

Performance was evaluated across the 15 cancer types using area under the receiver operating characteristic curve (AUROC), where higher values indicate stronger discrimination between patients who develop cancer and matched controls. The lung cancer cohort was also used as a benchmark for comparison with traditional machine learning models and a single-agent LLM baseline.

Diagram of a multi-agent framework for analyzing longitudinal EHR data. Panel a shows longitudinal EHR XML chunks processed by multiple worker agents and stored in long-term memory, which feed a manager agent that produces predicted risk and patient-level summary output. Panel b shows input EHR data elements, a one-year washout period, and the timing of prediction and cancer diagnosis. Panel c shows individual patient summaries combined into population-level insights.
Fig 1. TrajOnco framework and study design. a. TrajOnco architecture; b. Study design; c. Patient-level summaries can be aggregated to generate population-level insights

Results

Multi-cancer discrimination

TrajOnco demonstrated heterogeneous but clinically meaningful performance across cancer types. AUROC values ranged from 0.64 to 0.80, with the highest discrimination for liver cancer and the lowest for colorectal cancer.

Bar chart showing AUROC by cancer type for the patient journey foundation model. Performance is highest for liver cancer (0.80) and lung cancer (0.79), then leukemia (0.75) and prostate cancer (0.74). Breast, bladder, cervical, multiple myeloma, esophageal, endometrial, lymphoma, gastric, ovarian, pancreatic, and colorectal cancers follow, with colorectal lowest at 0.64. Error bars indicate variability around each estimate.
Fig 2. TrajOnco shows varied performance on different cancer types.

These results suggest that a unified multi-agent LLM framework can detect meaningful prediagnostic signals across diverse cancers, even without cancer-specific training.

Lung cancer benchmark

In the lung cancer cohort, TrajOnco achieved an AUROC of 0.79, comparable to trained machine learning models. XGBoost performed highest with an AUROC of 0.80, logistic regression reached 0.73, and k-nearest neighbors reached 0.58.

This benchmark is important because it evaluates TrajOnco against models trained specifically for the task. While XGBoost remained modestly higher, TrajOnco achieved competitive discrimination while preserving a training-free workflow and producing natural-language summaries and rationales.

Interpretable reasoning and downstream use

Beyond risk scores, TrajOnco generated concise patient-level summaries and evidence-linked rationales. These outputs help make the prediction process more transparent and support downstream analyses, including cancer real-world evidence generation and exploratory knowledge discovery. For example, a topic modeling framework can identify top themes for each cancer from the patient-level summaries.

Grid of small bar charts showing the prevalence of selected clinical and demographic features by cancer type for the patient journey foundation model dataset. Each panel is labeled by cancer type and highlights the five most common associated features, such as metabolic syndrome, obesity, diabetes, smoking, and site-specific symptoms or conditions. The figure shows that the dominant features vary by cancer type, with obesity-related, metabolic, and condition-specific patterns appearing across the cohorts.
Fig 3. Top 5 themes of each cancer type identified from TrajOnco’s summaries.

Compared with a single-agent LLM, the multi-agent system showed stronger temporal coherence and clinical reasoning. This is consistent with the design of the framework: instead of requiring one model call to reason over a long and noisy EHR history, TrajOnco distributes the task across sequential agents and preserves clinically important events in memory.

Discussion

These findings show that a multi-agent LLM system can estimate short-term risk across multiple cancers directly from longitudinal EHR data. Three findings are especially relevant for real-world deployment.

First, TrajOnco uses a unified framework across cancer types, avoiding the need to train and maintain separate models for each disease. Second, its performance was clinically meaningful across all 15 cancers evaluated, with the strongest results for liver and lung cancer. Third, the system produces interpretable summaries and rationales, which may help bridge the gap between predictive modeling and clinical review.

Limitations include variation in performance across cancers, reliance on structured EHR data that may not capture care delivered outside contributing health systems, and the need for additional validation before clinical use. Future work will evaluate broader cancer cohorts, refine risk calibration, and explore how interpretable LLM outputs can support screening workflows and real-world evidence generation.

These findings are consistent with the data and study design described in the ASCO abstract.

Citations

  1. Zeng S, Kim Y, Lau W, Alipour E, Etzioni R, Yetisgen M, Oka A, Nanduri J. Predicting multi-cancer risk from EHR data using multi-agent LLMs. ASCO oral presentation.
  2. Zeng S, et al. TrajOnco: a multi-agent framework for temporal reasoning over longitudinal EHR for multi-cancer early detection. arXiv:2604.10386.

 

Share this

Recent posts

Follow Truveta

Stay up-to-date