1 week ago
Long Short-Term Memory–GPT-4 Integration for Interpretable Biomedical Signal Classification: Proof-of-Concept Study
Background: Approximately 3.8 billion people lack access to essential health services, and diagnostic interpretation remains a major bottleneck in remote and resource-constrained settings. Limited access to specialists and the complexity of biomedical signal interpretation (eg, electrocardiogram [ECG] and electroencephalogram) contribute to delays in recognizing cardiovascular and neurological conditions. Objective: The study aimed to develop and evaluate a technical framework integrating long short-term memory (LSTM) networks with GPT-4 to provide automated biomedical signal classification and human-readable interpretations, suitable as a foundation for future deployment in resource-constrained environments. Methods: The 2-layer LSTM architecture (128→64 units) was selected based on preliminary experiments comparing configurations ranging from single-layer networks (64, 128 units) to deeper architectures (128→64→32 units). The chosen configuration balanced model capacity against overfitting risk and computational efficiency. The framework was evaluated using public PhysioNet datasets: Massachusetts Institute of Technology–Beth Israel Hospital (MIT-BIH) Arrhythmia, Physikalisch-Technische Bundesanstalt (PTB) Diagnostic ECG, Physikalisch-Technischen Bundesanstalt-extra large, Chapman-Shaoxing, Medical Information Mart for Intensive Care-III Waveforms, and Sleep-European data format. A patient-level split protocol (70/15/15) was used to reduce leakage risk. The LSTM architecture (128→64 units) performed temporal feature extraction with softmax-based classification for mutually exclusive classes. GPT-4 was integrated via an application programming interface with structured prompts to generate clinical interpretations from model outputs. Results: For the expert evaluation, we randomly sampled 50 test cases per dataset (150 total: 30 from each class for MIT-BIH, 25 per class for PTB, and 20 per class for Children's Hospital Boston-Massachusetts Institute of Technology), ensuring balanced class representation. Three board-certified physicians (2 cardiologists for ECG datasets and 1 neurologist for the electroencephalogram dataset) independently reviewed GPT-4–generated interpretations. Reviewers were blinded to whether signals were correctly or incorrectly classified by the LSTM model. Each interpretation was rated on a 5-point Likert scale (1=clinically inappropriate and 5=highly accurate and clinically useful). Interrater reliability was assessed using Fleiss κ (0.78, substantial agreement). On held-out test sets, classification performance was as follows: MIT-BIH 92.3% accuracy (=0.91, AUC=0.95), PTB Diagnostic 94.7% (=0.94, AUC=0.97), Physikalisch-Technischen Bundesanstalt-extra large 88.9% (=0.88, AUC=0.93), Chapman-Shaoxing 91.2% (=0.90, AUC=0.94), Medical Information Mart for Intensive Care-III 89.5% (=0.89, AUC=0.92), and Sleep-European data format 87.3% (=0.86, AUC=0.91). Expert evaluation of generated interpretations (3 board-certified cardiologists) rated clinical accuracy 4.3 out of 5, clarity 4.6 out of 5, and actionability 4.2 out of 5, with strong interrater agreement (κ>0.85). Conclusions: This proof-of-concept demonstrates an explicit methodological integration of deep learning–based biomedical signal classification with GPT-4–based interpretation, provides a technical foundation for future prospective clinical validation, field studies, and regulatory review prior to clinical deployment in underserved settings.
JMIR Formative Res: Long Short-Term Memory–GPT-4 Integration for Interpretable Biomedical Signal Classification: Proof-of-Concept Study #BiomedicalEngineering #HealthTech #MachineLearning #AIinHealthcare #LSTM
1
0
0
0