Truveta brand logo mark in teal on a black background, featuring stacked chevron shapes forming the Truveta symbol.

ASCO 2026: Benchmarking US oncology populations and survival using multi-health system EHR data

by | May 29, 2026

Authors: Hunter Hollis, MS Truveta, Inc, Bellevue, WA, Amy Sullivan, MS Truveta, Inc, Bellevue, WA, Samuel Gratzl, PhD  Truveta, Inc, Bellevue, WA, Jennifer Liang, MD  Truveta, Inc, Bellevue, WA, Karen Gilbert Farrar, PhD  Truveta, Inc, Bellevue, WA, Duy Hoang, PhD  Truveta, Inc, Bellevue, WA, Nina B Masters, PhD, MPH  Truveta, Inc, Bellevue, WA, Sunny Guin, PhD Truveta, Inc, Bellevue, WA

Benchmarking US oncology populations and survival using multi-health system EHR data
  • Across six solid tumor types, demographic distributions in Truveta were broadly similar with SEER, with small differences in age, sex, and race.
  • One-year overall survival estimates in Truveta were within 2 percentage points of SEER for bladder, breast, cervical, and prostate cancers.
  • Using real-world evidence creates an opportunity to link longitudinal data about risk factors, treatment, and outcomes.

Real-world evidence has become increasingly important in oncology research because it captures how cancer is diagnosed, treated, and managed in routine clinical practice across diverse patient populations (1).

While traditional cancer registries such as the Surveillance, Epidemiology, and End Results (SEER) Program provide high-quality population-level surveillance data and remain foundational for understanding cancer incidence, mortality, and survival trends, they are often limited in clinical granularity, treatment detail, and timeliness (2, 3). In contrast, electronic health record (EHR)-derived data can provide more comprehensive and near real-time information on patient demographics, comorbidities, laboratory results, biomarker testing, treatment patterns, disease progression, and outcomes observed during everyday care delivery. The growing availability of large-scale EHR datasets has created new opportunities to complement established cancer registries and address research questions that may not be fully captured through traditional surveillance systems alone.

This report compares oncology data derived from EHR systems with SEER registry data to evaluate similarities and differences in patient characteristics, cancer prevalence, and clinical capture. We compared oncology populations in Truveta with SEER across six solid tumor types: bladder, breast, cervical, colorectal, lung, and prostate cancer. Specifically, we evaluated how closely the patient populations represented in each data source align and the extent to which overall survival estimates are comparable between the two datasets. By examining where these data sources converge and diverge, this analysis aims to inform the interpretation of real-world oncology evidence and support the appropriate use of EHR-based datasets in cancer research, clinical decision-making, and public health surveillance.

Methods

Using a subset of Truveta Data, we identified adults aged 18 years or older with a bladder, breast, cervical, colorectal, lung, and prostate cancer diagnosis between January 1, 2017 and December 22, 2025, who had at least one encounter in the year prior to diagnosis, and had data characterizing their tumor, including stage, grade, or performance status. Diagnosis, stage, grade, and performance status were derived from clinical notes.

Patients were followed from diagnosis until death or last encounter. We compared demographic characteristics of these patients with SEER data accessed on December 22, 2025. Overall survival one year and five years post diagnosis was evaluated with Kaplan-Meier estimates. Survival analysis was stratified by cancer type and subgroup, including age group, sex, and race and ethnicity.

Results

Study population

The study included 368,774 patients from Truveta and 3,563,896 patients from SEER. Within Truveta, the cohort included 10,350 patients with bladder cancer; 143,883 with breast cancer; 7,023 with cervical cancer; 42,797 with colorectal cancer; 100,190 with lung cancer; and 64,531 with prostate cancer.

Demographic comparison with SEER

Across all six cancer types, demographic distributions in Truveta were broadly concordant with SEER. Differences in age group, sex, and most race and ethnicity categories were generally small, with absolute differences of 5 percentage points or less.

The main demographic differences were in the proportions of non-Hispanic White and Hispanic patients. Across cancer types, the proportion of non-Hispanic White patients was higher in Truveta, ranging from 63% to 85%, compared with 51% to 85% in SEER.  Conversely, the proportion of Hispanic patients was lower in Truveta, ranging from 3% to 16%, compared with 7% to 25% in SEER.

One-year overall survival

Estimated one-year overall survival in Truveta was 81% for bladder cancer, 97% for breast cancer, 89% for cervical cancer, 86% for colorectal cancer, 68% for lung cancer, and 96% for prostate cancer.

Estimated one-year overall survival in SEER was 83% for bladder cancer, 96% for breast cancer, 85% for cervical cancer, 79% for colorectal cancer, 42% for lung cancer, and 96% for prostate cancer.

Discussion

Overall, these findings suggest that Truveta and SEER capture similar oncology populations across several core demographic dimensions, while also showing meaningful differences in representation for some racial and ethnic groups.

These findings also suggest that survival rates were similar across Truveta and SEER for bladder, breast, cervical, and prostate cancers, but colorectal and lung cancer survival estimates were higher than SEER. Truveta estimates were 8 percentage points higher for colorectal cancer and 26 percentage points higher for lung cancer.

This analysis suggests that multi-health system EHR data from Truveta can provide oncology populations and survival estimates that are broadly aligned with a major US cancer registry. Because EHR-based real-world data can offer richer longitudinal detail on patient care—including information on biomarkers, treatment patterns, and clinical notes—real-world oncology data are critical for understanding the full patient journey.

Several considerations are important when interpreting these results. Truveta and SEER are distinct data sources with different data collection processes, and the Truveta cohort required prior healthcare engagement and tumor-characterizing data, which may favor patients with more complete documentation. The analysis also focused on adults with one primary tumor and six solid tumor types, so results may differ in other cancer populations or narrower clinical subgroups.

Even with those considerations, these findings support the use of real-world EHR data to study oncology patients. Across several common cancer types, Truveta showed strong agreement with SEER in both cohort composition and one-year overall survival, providing an important foundation for future oncology real-world evidence studies.

Citations

  1. Di Maio M, Perrone F, Conte P. 2020. Real‐World Evidence in Oncology: Opportunities and Limitations. Oncologist [Internet]. [accessed 2026 May 27] 25(5):e746–e752. doi:10.1634/theoncologist.2019-0647
  2. About the SEER Program. SEER [Internet]. [accessed 2026 May 27]. https://seer.cancer.gov/about/overview.html
  3. SEER Treatment Data Limitations (November 2023 Submission) – SEER Data & Software. SEER [Internet]. [accessed 2026 May 27]. https://seer.cancer.gov/data-software/documentation/seerstat/nov2023/treatment-limitations-nov2023.html

Share this

Recent posts

Follow Truveta

Stay up-to-date