Cancer is one of the most heavily funded areas of medical research. And yet, with thousands of therapies in development and billions of dollars spent each year, many foundational questions persist: Why do patients stop treatment? How do outcomes vary across populations? What happens after the first line of therapy fails?
The challenge isn’t a lack of innovation. It’s the absence of comprehensive data. Clinical trials are too restrictive, registries are too slow, and most real-world datasets only capture a fragment of the cancer journey. To close these evidence gaps, oncology research needs a new approach—one that captures the complexity, diversity, and evolution of cancer care in the real world.
With more than 7 million oncology patient journeys across 100+ cancer types, Truveta goes beyond isolated snapshots from oncology clinics to capture the entire patient journey across specialties, care settings, and cancer types. Regulatory grade EHR data—linked with closed claims, mortality, and tumor genetics—is updated daily and built to exceed FDA standards, offering a more complete foundation for oncology research.

Available for research: Real-world cohorts across 100+ cancer types
Researchers now have the opportunity to explore cancer care with unprecedented depth and breadth. Truveta’s oncology dataset includes extensive populations across both rare and common cancers.
Below is a preview of just some of the cancers available, with detailed patient counts and supporting data like imaging, procedures, labs, and notes.





Critical gaps in oncology RWD—and how to close them
- Incomplete patient journeys
Most datasets only reflect care delivered in medical oncology. But cancer diagnosis and treatment span multiple settings—starting in primary care or radiology, extending to surgery, infusion, imaging, and palliative care.
Truveta captures data from across all specialties and settings, enabling researchers to follow the complete, longitudinal journey of each patient—from first symptoms to long-term outcomes.
- Missing critical concepts
Essential insights — like staging, progression, response, and line of therapy — are often buried in unstructured notes or absent altogether.
Truveta Language Model extracts these concepts at scale using clinically trained AI, transforming billions of daily EHR data points from clinical notes and images into structured, research-ready variables.
- Out-of-date data
Registries and curated datasets often lag by 1–2 years or more, missing current treatment patterns and delaying insight.
Truveta Data is refreshed daily, giving researchers timely access to the most current view of cancer care and enabling real-time analysis of therapy adoption, switching, and discontinuation.
- Limited population diversity
Many RWD sources are built from single institutions or narrow patient groups, limiting representativeness and generalizability.
Truveta Data includes over 7 million cancer patient journeys, offering scale and diversity unmatched in oncology research — with representation across regions, ethnicities, payers, and care settings.
- Disconnected outcomes and economics
Clinical data often lives separately from cost and outcomes data, making it difficult to study total burden or real-world value.
Truveta links EHRs with closed claims and mortality, enabling comprehensive analyses that include treatment patterns, survival, and cost of care across all payer types.
What to look for in modern oncology RWD
If you’re evaluating real-world data for oncology, here are five questions to ask:
- Comprehensiveness: Does the dataset include non-oncology care, such as primary care, urology, or radiology?
- Data depth: Are concepts like stage, line of therapy, progression, and response captured or extractable?
- Timeliness: Is the data updated frequently enough to monitor trends, postmarket use, or disparities in near-real time?
- Linkage capabilities: Can you link to mortality or claims data to understand outcomes and economics?
- Regulatory readiness: Is the data model transparent and well-documented enough to support regulatory grade research?
With Truveta Data, the answer to each of these questions is yes. Truveta Data is the most representative, complete, and timely patient journey data, exceeding FDA standards of data quality, data provenance, and audit readiness.
Genomics and the future of precision oncology
The future of precision oncology relies on researchers’ understanding of the biological drivers of disease at scale. The Truveta Genome Project is making this possible by creating the largest and most diverse genotypic and phenotypic dataset ever assembled.
In partnership with 30 health systems, Regeneron Genetics Center, and Illumina, Truveta will sequence the exomes of tens of millions of consented patients and link that data to longitudinal EHRs (including notes and images), closed claims, and mortality data. For oncology, this means new opportunities to understand treatment response, resistance mechanisms, and disease progression across a broad and representative population.

Moving beyond the blind spots
Real-world cancer research is entering a new era — one defined by depth, speed, and completeness. Whether you’re working to accelerate discovery, validate outcomes, or expand access, the data you use will determine what’s possible.
Reach out to explore Truveta Data and learn more about our oncology capabilities.