A Conversational Platform (Okaya) for Multimodal Digital Biomarkers of Fatigue, Cognition, and Mental Health: #feasibility Observational Study
Background: Collection of multimodal data (video, audio, and text) can yield digital biomarkers relevant to mental health, fatigue, and cognition. However, the #feasibility and signal characteristics in operational populations remain underexplored. Objective: The objectives of this study were to (1) extract an evidence-based library of vision, speech, and language features; (2) assess the #feasibility of a fully remote conversational platform (Okaya) for collecting analyzable multimodal data; and (3) conduct preliminary signal checks for depression, fatigue, and cognition. Methods: Participants were recruited from the US Air Force and US Space Force. All participants completed the Okaya check-in, which included a voice conversation with a large language model. A total of 66 visual, acoustic, and text features were extracted from each interaction between the participant and the large language model. For validation purposes, the study also collected measures of depression (Patient Health Questionnaire–9), fatigue (#Cancer Fatigue Scale), and cognition (trail making test). We evaluated the #feasibility of the platform and correlation between the extracted features and the validated assessments. Results: A total of 8 unique participants contributed with 62 sessions over a period from March 6, 2025, to August 6, 2025. The platform was deemed feasible as 6 of the 8 participants opted to complete more than one session, and the 3 participants who provided feedback reported high overall experience and #usability. From the data perspective, preliminary correlations produced significant results for multiple potential digital biomarkers, including (1) pitch (=.047), volume SD (=.04), volume slope (=.04), automated readability index complexity (=.047), Flesch-Kincaid complexity (=.04), and Gunning Fog complexity (=.04) for depression; (2) pitch (=.009), volume SD (=.007), volume slope (=.02), average F2 formant frequency (=.03), Gunning Fog complexity (=.049), and eyelid droop (=.047) for fatigue; and (3) shimmer (=.03) for cognition. We also observed how features varied over time among participants with multiple sessions. Conclusions: The conversational and artificial intelligence (#AI)–enabled platform was feasible among an operational sample as a method to collect multimodal data correlated with depression, fatigue, and cognition. These results align with those for previously discovered digital biomarkers of mental health, fatigue, and cognition and inform the development of personalized models for each user while detecting anomalies in a remote monitoring setting.