Extracting seizure frequency from clinical notes at scale using the Truveta Language Model

by | Mar 24, 2026

Seizure frequency is one of the most important outcomes in epilepsy research, but it often lives in unstructured clinical notes rather than standard EHR fields. In a recent Neurology abstract, researchers from SK Life Science and Truveta show how the Truveta Language Model can extract seizure frequency information at scale, turning narrative documentation into usable longitudinal data for real-world research.

Study objective

The study aimed to determine whether seizure frequency information could be reliably extracted from clinical notes at scale to support longitudinal research in patients with epilepsy treated with cenobamate.

To do this, researchers applied the Truveta Language Model (TLM), a large-language, multi-modal AI model designed to extract structured information from billions of daily EHR data points.

Training and validation across diverse clinical notes

The model was trained using clinical notes from Truveta’s health system data. Notes were annotated and reviewed by internal clinical experts to ensure accurate identification of seizure-related information.

To capture the diversity of clinical documentation styles, the training data included notes from multiple provider types. Model performance was then evaluated using an independent sample of clinical notes from a separate set of patients.

Across the dataset, researchers incorporated 17 different note types, including progress notes, consultation notes, history and physical documentation.

sample epilepsy patient note

Key findings

The analysis included 2,480 patients receiving cenobamate. Using the extracted information, researchers were able to construct longitudinal patient timelines capturing:

  • Seizure counts
  • Seizure frequency
  • Changes in seizure frequency
  • Temporal relationships between seizure mentions and treatment initiation

The model achieved an extraction confidence ratio of 97% high-confidence notes to 3% low-confidence notes, indicating strong performance in identifying seizure-related information across varied clinical documentation.

These structured outputs enabled researchers to observe trends in seizure activity over time, providing a detailed view of each patient’s seizure condition before and after initiating cenobamate therapy.

Implications for epilepsy research

The findings illustrate how information embedded in routine clinical documentation can be systematically extracted and analyzed to support large-scale outcomes research.

For epilepsy, this type of analysis may enable researchers to examine real-world treatment effectiveness, changes in seizure frequency over time, and rates of seizure freedom across broader patient populations.

More broadly, the work demonstrates an approach to studying clinical outcomes that are commonly documented in narrative notes rather than structured fields.

Read the full study: Scalable Extraction of Seizure Frequency Information from Clinical Notes Using the Truveta Language Model.

Share this

Recent posts

Follow Truveta

Stay up-to-date