Truveta Data

Powering real-time insights

Get the most complete, timely, and representative view of US patient care

Truveta provides daily-updated EHR data for more than 120 million patients—sourced directly from leading US health systems with a shared mission of Saving Lives with Data. This trusted partnership ensures consistent data quality, full traceability to the clinical source, and access to unique and clean data for research.

See the full picture with linked EHR data

By linking complete EHR data with closed claims and mortality data, Truveta enables researchers to generate both clinical and economic insights across the entire care journey. Track diagnoses, treatments, outcomes, and costs across settings, payers, and populations—supporting comparative effectiveness, burden of illness, health economics research, and more. Truveta Data can also be linked with proprietary datasets for expanded research applications.

Closed claims available for 200M+ patients across 100+ commercial payers, Medicare, and Medicaid

Includes medical and pharmacy claims dating back to 2016

Expands visibility into longitudinal outcomes and total cost of care

Key features
Truveta
Other
Impact
120M+ electronic health records directly from US health systems

Access daily updates and unprecedented completeness, including images and clinical notes

Daily refreshes
Study of emerging therapies, captures early signals, and reduces reliance on costly trials and registries
Clinical notes at scale

Few RWD vendors

Unlock any clinical concept of interest with expert-level AI to enable novel research

Longitudinal imaging studies

Lack EHR integration

Enable AI model development, adjudicate outcomes, and study real-world response

Integrated closed claims and mortality

Often requires linking

Conduct robust comparative effectiveness research with detailed cost, utilization, and outcomes data

Pediatric and mother-child data

Few RWD vendors

Study underrepresented patients and address critical evidence gap in children

Admission-discharge-transfer (ADT) data

Enables minute-level comparative effectiveness of procedure times, recovery and lengths of stay

Regulatory grade

Limited provenance

Aligns with FDA standards and provides full, audit-ready data provenance

Immediately available

Data cuts licensed separately

Provides immediate, unlimited access to row-level patient data for health economics and outcomes research

Broad payer mix

Often skews commercial

Offers broad coverage across 100+ payers for nationally representative studies

Emerging genomics and phenotype capabilities

Limited EHR coverage

Enables deep clinical research and broad use with access to vitals, labs, images, clinical notes, and long-term outcomes

Work faster with clean data, normalized with expert-led AI

To unlock these insights across the care journey, data must first be clean, consistent, and structured at scale. The Truveta Language Model cleans and normalizes trillions of daily EHR data points, giving you high-quality, ready-to-analyze inputs—no wrangling required.

Standardized across systems, specialties, and formats for consistency
Mapped to standard terminologies (e.g., SNOMED CT, RxNorm, LOINC) for analysis-ready insights
Delivered with full traceability to the clinical source, meeting FDA standards for data quality and audit readiness
120M+

Electronic health records

All care settings
5+ years longitudinal history
Diagnosis (SNOMED, ICD)
Procedure (CPT, HCPCS)
Medication (RXNORM, NDC)
Labs (LOINC)
Immunizations (CVX)
Pharmacy
7B+

Clinical notes

100M+

Imaging studies

45M+

Unique devices

1M+

Mother-child pairs

Example of TLM mapping lab results to the appropriate medical ontology

Go beyond structured data with notes and images

Tap into previously inaccessible sources of clinical insight—now available at scale and mapped to longitudinal EHR data. When paired with structured data, notes and images enable deeper understanding of disease progression, treatment safety and effectiveness, reasons for treatment decisions, and more.

Access 7B+ clinical notes across all care settings and note types

L

Stage of illness

L

Treatment, reason for change in treatment regimen

L

Treatment not considered due to patient preference

L

Genomic variants

L

Specific staging information across recurrence staging, clinical staging, and pathology staging

Analyze 100M+ medical images—searchable and linked with rich EHR data

Unlock rare visibility into maternal and pediatric care

Truveta provides data on more than 1.4 million deterministically linked mother–child pairs. Researchers can study prenatal risk factors, birth outcomes, early drug and vaccine safety, and pediatric development using EHR data that spans pregnancy through early childhood.

Power new discoveries by linking genetics to real-world care

The Truveta Genome Project will create the largest and most diverse database of genotypic and phenotypic information ever assembled to enable drug discovery, optimize clinical trials, and transform how diseases are prevented, diagnosed, and cured. This genetic data will be linked to de-identified medical records and added to Truveta Data for research.