- Patient journey foundation models (PJFMs) achieved 98% accuracy in imputing missing or incorrect units of measure (UoMs) across 438 unique UCUM units in EHR data.
- The model accurately distinguished between clinically plausible units, including overlapping laboratory units such as mg/dL versus mg/L.
- These findings demonstrate how foundation models trained on longitudinal patient journeys can improve EHR data quality and support downstream clinical and research applications.
Electronic health records (EHRs) contain large amounts of longitudinal clinical information but are often incomplete or inconsistently represented. One common challenge is missing or incorrect units of measure (UoMs), which can substantially alter interpretation of laboratory and clinical values. For example, the same numeric value may represent very different clinical meanings depending on whether it is recorded in mg/dL or mmol/L.
Foundation models trained on longitudinal patient trajectories may help address this challenge by learning contextual relationships across clinical events. In this study, we evaluated whether a Patient Journey Foundation Model (PJFM) could accurately impute missing UoMs across diverse laboratory and observational data.
Methods
Using a subset of Truveta Data, we represented each patient journey as a chronological sequence of clinical events including diagnoses, laboratory values, units, and timing information. Events were encoded using SNOMED-CT, LOINC, and UCUM vocabularies.
The PJFM architecture was based on GPT-2 and extended to jointly model event type, medical code, value, unit, and time using transformer-based embeddings. The model was pretrained on the task of next event prediction with an auxiliary head trained to predict UoMs across 438 unique UCUM units for each event in the patient journey from the observations and laboratory test tables (Figure 1). Missing UoMs occurred in 19% of laboratory tests and 50% of observations. Performance was evaluated using accuracy, precision, recall, and weighted F1-score.
Results
The PJFM achieved 98% overall accuracy in imputing UoMs, with a weighted average F1-score of 0.90. Performance was highest for commonly occurring units such as mg/dL, %, mmol/L, and g/dL.
The model also performed well for measurements with multiple clinically plausible units. For example, body weight and height units were predicted with 95–100% accuracy across kilograms, pounds, centimeters, feet, and inches. The model incorporated broader physiologic context when making predictions. In a synthetic example, the same numeric values produced different predicted units depending on whether the resulting body mass index (BMI) was clinically plausible.
As an example for measurements with units of measurement with overlapping distributions, strong performance was observed for C-reactive protein (CRP), which may be reported in either mg/dL or mg/L. Despite overlapping numeric ranges, the model achieved 94% accuracy for mg/dL and 89% accuracy for mg/L.
Discussion
This study demonstrates that patient journey foundation models can accurately impute missing or incorrect units of measure across diverse EHR data. Beyond predictive modeling, PJFMs may provide important data quality improvements by harmonizing incomplete structured clinical information.
The model’s ability to use broader patient context rather than isolated values suggests foundation models can learn clinically meaningful physiologic relationships from longitudinal EHR data. Improving UoM consistency may enhance downstream clinical research, interoperability, and real-world evidence generation.
While future applications of PJFMs may include disease prediction and clinical decision support, this work highlights how even focused use cases such as UoM imputation can meaningfully improve healthcare data quality.
These are preliminary research findings and not peer reviewed. Data are constantly changing and updating. These findings are consistent with data accessed September 2025.
Citations
- A. Radford et al., Language Models are Unsupervised Multitask Learners (2019).
- J. Su et al., RoFormer: Enhanced Transformer with Rotary Position Embedding (2024).
- K. Bradwell et al., Harmonizing Units of Measure in Electronic Health Records (2022).



