Authors: Youngwon Kim, PhD ⊕,Truveta, Inc, Bellevue, WA, Wilson Lau, PhD ⊕, Truveta, Inc, Bellevue, WA, Ehsan Alipour, MD, PhD ⊕, Truveta, Inc, Bellevue, WA, Sihang Zeng ⊕, Truveta, Inc, Bellevue, WA, Anand Oka, PhD ⊕, Truveta, Inc, Bellevue, WA, Jay Nanduri, MBA, MS ⊕Truveta, Inc, Bellevue, WA
- In more than 1.7 million cancer patients, a data-driven Markov transition matrix identified clinically coherent health state transitions without requiring predefined feature selection.
- The model surfaced meaningful persistence and comorbidity patterns, including strong clustering across cardiometabolic, respiratory, and mental health conditions.
- These findings suggest MTMs could provide a reproducible framework for characterizing oncology patient journeys and informing empirically grounded state-transition models in real-world evidence.
Oncology patient journeys are complex and dynamic, involving transitions among periods of stability, deterioration, and remission. Existing electronic health record (EHR)-based studies often rely on predefined feature sets derived from disease-specific knowledge, potentially overlooking latent patterns and temporal dependencies in longitudinal data.
Markov transition matrices (MTMs) offer a data-driven way to empirically define health states and estimate transition probabilities over time without prior feature selection. This study evaluates the descriptive and structural validity of MTMs for characterizing condition persistence and comorbidity dynamics in a large, nationally representative cancer population and explores their utility for informing empirically grounded state-transition models in health outcomes research.
Methods
We analyzed de-identified EHR data from 1,756,606 cancer patients within Truveta Data, representing longitudinal care across US health systems. For each patient, the first cancer-related diagnosis served as the index event, and analysis was restricted to the 24-month window prior to the index diagnosis.
MTMs were constructed to estimate transition probabilities P(St+1∣St) from temporally ordered condition sequences. To support stable estimation and interpretable state spaces, the analyses were restricted to the 30 most frequent conditions (by occurrence count). Less frequent conditions were treated as chain-breaking events to preserve temporal structure while reducing noise. Self-transitions captured condition persistence, and off-diagonal transitions captured comorbidity dynamics. The population had a mean age of 68.1 years; 58% were female and ~80% were white (Table 1).
Results
Across the top 30 conditions (Figure 1), the mean self-transition probability was 0.07 (range: 0.01–0.41), with persistence concentrated in chronic and treatment-related states—repeated prescription (0.41), long-term anticoagulant use (0.27), breast cancer (0.15), type 2 diabetes (0.13), and atrial fibrillation (0.11). In contrast, symptom-level conditions demonstrated lower persistence (~0.01–0.02).
Cardiometabolic conditions formed the densest subnetwork, with strong bidirectional transitions among hypertension, hyperlipidemia, and diabetes (e.g., diabetes → hypertension 0.26; coronary arteriosclerosis → diabetes 0.23). The largest magnitudes, however, likely reflect care processes rather than biological relationships (e.g., hyperlipidemia → hypertensive disorder 0.62; GERD → hyperlipidemia 0.44), potentially driven by shared screening and co-documentation.
Mental health transitions showed marked directional asymmetry, with depression emerging as a downstream convergence state. Strong inflows into depression were observed from chronic pain (0.29), anxiety (0.28), asthma (0.19), and COPD (0.17), with comparatively limited reverse transitions. Respiratory conditions bridged cardiometabolic and mental health clusters, linking to both cardiovascular and emotional health endpoints. Notable transitions included asthma → COPD (0.15), COPD → coronary disease (0.20), asthma → atrial fibrillation (0.11), asthma → depression (0.19), and COPD → depression (0.17).
Atrial fibrillation acted as a late-stage multimorbidity hub, transitioning toward anticoagulation therapy, coronary disease, and hypertension (~0.11–0.12), whereas cancer-related diagnoses remained comparatively self-contained, consistent with focused oncologic care trajectories.
Discussion
Data-driven MTMs provide an interpretable and scalable framework for characterizing condition persistence and comorbidity dynamics in large-scale EHR data without prespecified feature selection. The approach recovers clinically coherent transition patterns while revealing directional asymmetries across chronic, respiratory, and mental health conditions. MTMs capture both biological relationships and healthcare utilization signals, offering a reproducible and empirically grounded foundation for parameterizing state-transition models in oncology and other chronic disease populations.



