Truveta brand logo mark in teal on a black background, featuring stacked chevron shapes forming the Truveta symbol.

ISPOR 2026: Mapping oncology patient journeys using a data-driven Markov transition matrixx

by | May 18, 2026

Authors: Youngwon Kim, PhD ⊕,Truveta, Inc, Bellevue, WA, Wilson Lau, PhD ⊕, Truveta, Inc, Bellevue, WA, Ehsan Alipour, MD, PhD ⊕, Truveta, Inc, Bellevue, WA, Sihang Zeng  ⊕, Truveta, Inc, Bellevue, WA,   Anand Oka, PhD ⊕, Truveta, Inc, Bellevue, WA, Jay Nanduri, MBA, MS Truveta, Inc, Bellevue, WA

Characterizing oncology patient Journeys and Health State Transitions Using a Data-Driven Markov Transition Matrix in Large-Scale Electronic Health Records
  • In more than 1.7 million cancer patients, a data-driven Markov transition matrix identified clinically coherent health state transitions without requiring predefined feature selection.
  • The model surfaced meaningful persistence and comorbidity patterns, including strong clustering across cardiometabolic, respiratory, and mental health conditions.
  • These findings suggest MTMs could provide a reproducible framework for characterizing oncology patient journeys and informing empirically grounded state-transition models in real-world evidence.

Oncology patient journeys are complex and dynamic, involving transitions among periods of stability, deterioration, and remission. Existing electronic health record (EHR)-based studies often rely on predefined feature sets derived from disease-specific knowledge, potentially overlooking latent patterns and temporal dependencies in longitudinal data. 

Markov transition matrices (MTMs) offer a data-driven way to empirically define health states and estimate transition probabilities over time without prior feature selection. This study evaluates the descriptive and structural validity of MTMs for characterizing condition persistence and comorbidity dynamics in a large, nationally representative cancer population and explores their utility for informing empirically grounded state-transition models in health outcomes research.

Methods

We analyzed de-identified EHR data from 1,756,606 cancer patients within Truveta Data, representing longitudinal care across US health systems. For each patient, the first cancer-related diagnosis served as the index event, and analysis was restricted to the 24-month window prior to the index diagnosis.

MTMs were constructed to estimate transition probabilities P(St+1St) from temporally ordered condition sequences. To support stable estimation and interpretable state spaces, the analyses were restricted to the 30 most frequent conditions (by occurrence count). Less frequent conditions were treated as chain-breaking events to preserve temporal structure while reducing noise. Self-transitions captured condition persistence, and off-diagonal transitions captured comorbidity dynamics. The population had a mean age of 68.1 years; 58% were female and ~80% were white (Table 1).

Table summarizing demographic characteristics of 1,756,606 oncology patients included in the Markov transition matrix analysis. The mean age was 68.1 years with a standard deviation of 17.0 years, and 58.1% of patients were female. Most patients were White (79.6%), followed by Black or African American (8.7%), Asian (2.7%), and other or unknown race (9.0%). Hispanic or Latino patients represented 6.3% of the population. Nearly half of patients were married (48.7%), while 13.8% were unmarried, 8.8% were widowed, and 28.7% had other or unknown marital status.

Results

Across the top 30 conditions (Figure 1), the mean self-transition probability was 0.07 (range: 0.01–0.41), with persistence concentrated in chronic and treatment-related states—repeated prescription (0.41), long-term anticoagulant use (0.27), breast cancer (0.15), type 2 diabetes (0.13), and atrial fibrillation (0.11). In contrast, symptom-level conditions demonstrated lower persistence (~0.01–0.02).

Cardiometabolic conditions formed the densest subnetwork, with strong bidirectional transitions among hypertension, hyperlipidemia, and diabetes (e.g., diabetes → hypertension 0.26; coronary arteriosclerosis → diabetes 0.23). The largest magnitudes, however, likely reflect care processes rather than biological relationships (e.g., hyperlipidemia → hypertensive disorder 0.62; GERD → hyperlipidemia 0.44), potentially driven by shared screening and co-documentation.

Mental health transitions showed marked directional asymmetry, with depression emerging as a downstream convergence state. Strong inflows into depression were observed from chronic pain (0.29), anxiety (0.28), asthma (0.19), and COPD (0.17), with comparatively limited reverse transitions. Respiratory conditions bridged cardiometabolic and mental health clusters, linking to both cardiovascular and emotional health endpoints. Notable transitions included asthma → COPD (0.15), COPD → coronary disease (0.20), asthma → atrial fibrillation (0.11), asthma → depression (0.19), and COPD → depression (0.17).

Atrial fibrillation acted as a late-stage multimorbidity hub, transitioning toward anticoagulation therapy, coronary disease, and hypertension (~0.11–0.12), whereas cancer-related diagnoses remained comparatively self-contained, consistent with focused oncologic care trajectories.

Heatmap showing Markov sequential transition probabilities among the top 30 conditions observed across 1,756,592 oncology patient journeys. Rows represent the current condition and columns represent the next condition in the sequence. Darker blue cells indicate higher transition probabilities. Strong self-transitions are observed for repeated prescriptions, long-term anticoagulant use, breast cancer, type 2 diabetes, and atrial fibrillation, indicating persistence of chronic or treatment-related conditions. Dense bidirectional transitions appear among cardiometabolic conditions such as hypertension, hyperlipidemia, diabetes, and coronary arteriosclerosis. Additional transition patterns connect respiratory conditions, chronic pain, anxiety, and depressive episodes, illustrating multimorbidity dynamics across oncology patient journeys.

Discussion

Data-driven MTMs provide an interpretable and scalable framework for characterizing condition persistence and comorbidity dynamics in large-scale EHR data without prespecified feature selection. The approach recovers clinically coherent transition patterns while revealing directional asymmetries across chronic, respiratory, and mental health conditions. MTMs capture both biological relationships and healthcare utilization signals, offering a reproducible and empirically grounded foundation for parameterizing state-transition models in oncology and other chronic disease populations.