Advertisement · 728 × 90

Posts by Yuzhe Yang

Preview
OSF: On Pre-training and Scaling of Sleep Foundation Models Polysomnography (PSG) provides the gold standard for sleep assessment but suffers from substantial heterogeneity across recording devices and cohorts. There have been growing efforts to build general-...

🚀 OSF turns these findings into a practical recipe for building more generalizable and deployable sleep AI.

➡️Paper: arxiv.org/abs/2603.00190

Great work led by my students @ZitaoShuai, @ZongzheX2001, David, and collaborator
@WeiWang1973! 🌙

#AI #sleep #sensor #health #multimodal #LLMs

1 month ago 3 1 0 0
Post image

Our third finding: scaling does help in sleep — but only with the right recipe. With the right SSL design, performance keeps improving as we scale:
📦 pre-training data size
🧠 model capacity
🌐 multi-source data mixture

So the message is not just scale more. It's: scale the right pre-training design.

1 month ago 1 0 1 0
Post image

But missing-channel inference is not hopeless.

We find that explicitly encouraging channel-invariant feature learning during pre-training can substantially improve both downstream performance and robustness when channels are missing.

1 month ago 1 0 1 0
Post image

Our first finding: existing sleep FMs can break badly under missing-channel inference.

This is not a corner case. In real sleep studies, channel availability changes across cohorts, devices, and protocols.

And when that happens, performance can drop sharply.

1 month ago 1 0 1 0
Post image

We did not want to run a narrow comparison of one or two methods.

Instead, we benchmarked major self-supervised learning families for sleep FM pre-training:
🔗 contrastive
🧠 self-distillation
♻️ reconstructive
➡️ autoregressive

1 month ago 1 0 1 0
Post image

A big reason progress in sleep FMs has been hard to compare is simple: we have not had a unified and open testbed.

We address that with SleepBench — a fully open benchmark built from public resources, with
⏱️ 166,500 hours of sleep recordings
🧑‍🤝‍🧑 21,000+ sleep studies
🌍 9 datasets
💾 ~20M 30s epochs

1 month ago 2 0 1 0
Video

Meet OSF — a fully open benchmark and state-of-the-art sleep foundation models. 🌙

We study pre-training and scaling recipes that actually improve generalization in real-world settings. 🏥

🌐 Website: yang-ai-lab.github.io/osf/
💻 Code: github.com/yang-ai-lab/...
🤗 Models: hf.co/yang-ai-lab/...

1 month ago 5 2 2 0
Preview
HEARTS: Benchmarking LLM Reasoning on Health Time Series The rise of large language models (LLMs) has shifted time series analysis from narrow analytics to general-purpose reasoning. Yet, existing benchmarks cover only a small set of health time series moda...

🚀 HEARTS is built as a living ecosystem for the community: new data, new reasoning tasks, and new models can continue to be added over time!
➡️Paper: arxiv.org/abs/2603.06638

#AI #HealthAI #LLM #TimeSeries #Multimodal

1 month ago 1 0 0 0
Advertisement
Post image

Finally, we asked whether input format changes the story. 🖼️📝

It does affect absolute performance, but much less than one might expect. Whether time series are presented as text, images, or other input forms, the relative task difficulty stays surprisingly stable.

1 month ago 1 0 1 0
Post image

Models from the same family often behave very similarly. 🧬

Even when absolute performance changes with scale, the overall performance pattern tends to remain stable within a family. That suggests scaling alone is not enough to solve the core reasoning gap.

1 month ago 0 0 1 0
Post image

We also found a striking temporal difficulty pattern. 📉

Longer sequences and higher sampling frequencies consistently make the tasks harder. Across models, domains, and modalities, performance tends to drop as the temporal burden increases.

1 month ago 1 0 1 0
Post image

A second takeaway is that many models rely on shortcuts rather than deep reasoning. 🪤

For Perception and Inference tasks, models often do reasonably well when there are explicit thresholds, obvious quantitative cues, or strong domain priors to lean on.

1 month ago 0 0 1 0
Post image

One clear takeaway: current LLMs are still weak at genuine health time-series reasoning. 📊

We evaluated 14 state-of-the-art LLMs and found that, although many beat a simple naive baseline, the gains are often modest. On many tasks, they still fall clearly behind specialized time-series models.

1 month ago 0 0 1 0
Video

To move beyond standard benchmark design, we built a hierarchical task taxonomy. 🏗️

Rather than only asking multiple-choice questions, HEARTS organizes 110 tasks into four cognitive levels:
🧠 Perception
🔍 Inference
✍️ Generation
⚙️ Deduction

1 month ago 0 0 1 0
Video

HEARTS also pushes models across a very wide temporal range. ⏱️

Reasoning over health time series is not only about short segments. Models may need to detect fine local structure, track long-range dependencies, or connect patterns across long periods of observation.

1 month ago 0 0 1 0
Advertisement
Video

One key goal was breadth of real-world physiological coverage. 🏥

Many existing benchmarks rely on synthetic signals or stay within a small number of domains. HEARTS instead brings real-world datasets spanning from motion and metabolism to sleep, respiration, surgery, speech, behavior, and more.

1 month ago 0 0 1 0
Video

Can LLMs really reason over health time series? 📈

Introducing HEARTS ❤️— the first living benchmark built for health time-series reasoning.

🌐Website: yang-ai-lab.github.io/HEARTS
🕵️Code: github.com/yang-ai-lab/...
🤗Dataset: hf.co/datasets/yan...
🏆Leaderboard: yang-ai-lab.github.io/HEARTS/leade...

1 month ago 1 0 1 0
Preview
SleepLM: Natural-Language Intelligence for Human Sleep We present SleepLM, a family of sleep-language foundation models that enable human sleep alignment, interpretation, and interaction with natural language. Despite the critical role of sleep, learning-...

SleepLM points to a new direction for sleep AI🚀. Read all about it!
➡️Paper: arxiv.org/abs/2602.23605

Great work led by my students @ZongzheX2001, @ZitaoShuai, Eideen, and amazing collaborators @AysolaRavi and Rajesh!

More to come🌙

#AI #sleep #sensor #health #multimodal #LLMs

1 month ago 1 0 0 0
Post image

Finally, we wanted this to connect to real clinical workflows. 🏥

SleepLM can combine its predictions across an entire night and produce useful full-night measures, while staying stable over long sequences. This matters as real sleep analysis is about understanding the whole night in a reliable way.

1 month ago 0 0 1 0
Post image

We also wanted the model to be more controllable. 🎛️

Instead of always generating one broad description, SleepLM can focus on a specific part of the physiology when asked. For example, it can emphasize 🧠brain activity, 🫁breathing, ❤️heart-related signals, or 💪body movement.

1 month ago 0 0 1 0
Post image

SleepLM also learns when something happens, not just whether it happened. ⏱️

Our results show that the model is sensitive to timing. The strongest match appears when the text and the signal line up at the correct moment, and that match weakens as the alignment moves away.

1 month ago 1 0 1 0
Post image

SleepLM learns a strong link between language and physiology. 🔄

When we ask it to match text to signals, or signals to text, it performs much better than general-purpose baselines. It not only reads sleep signals well — but also learns a shared space where signal and language line up closely.

1 month ago 0 0 1 0
Post image

One clear takeaway: general LLMs are not enough. 📊

Even strong LLMs 🤖 are not built for dense physiology. They often work with summaries, but struggle when the task depends on subtle waveform structure.

🛌 SleepLM is designed for that setting, and it shows clear gains on zero-shot sleep tasks.

1 month ago 0 0 1 0
Advertisement
Video

At the core is ReCoCa 🏗️, our unified training framework.

It combines three signals in one objective:
🔗 contrastive alignment
✍️ caption generation
♻️ signal reconstruction

The result is a representation that stays both language-aware and signal-grounded.

1 month ago 1 0 1 0
Video

Traditional sleep scoring compresses rich signals into a small set of labels. 🧩

We built a multilevel strategy to turn sleep into layered text descriptions. This gives a much richer view of sleep, enabling us to curate the first sleep-language dataset:

🗂️100K+ hours of data from >10,000 people! 🚀

1 month ago 0 0 1 0
Post image

🌙 What if your sleep signals could speak?

Introducing SleepLM — sleep-language foundation models that turns raw sleep into something we can describe, query, and localize with language. 🗣️

🌐Website: yang-ai-lab.github.io/SleepLM
🕵️Code: github.com/yang-ai-lab/...
🤗Models: hf.co/yang-ai-lab/...

🧵👇

1 month ago 2 0 1 0
Post image

📢 My lab at UCLA is hiring PhD students and postdocs!

Please apply to UCLA CS or CompMed and mention my name if you are interested in foundation models and (Gen)AI for health / medicine / science.

More info: cs.ucla.edu/~yuzhe

4 months ago 2 1 0 0
Preview
SensorLM: Learning the Language of Wearable Sensors We present SensorLM, a family of sensor-language foundation models that enable wearable sensor data understanding with natural language. Despite its pervasive nature, aligning and interpreting sensor ...

Read all about it!
➡️Paper: arxiv.org/abs/2506.09108

Huge team effort! Kudos to my intern Evelyn, amazing team @kmr_ayush, @aametwally1, @Orson_Xu, @timalthoff, @pushmeet, @cecim, @xliucs, @danmcduff, and other amazing co-authors!

#AI #wearable #sensor #health #multimodal
(8/8)

10 months ago 1 0 0 0
Post image

Beyond its discriminative power, SensorLM showcases compelling generative capabilities. It can produce hierarchical and realistic captions from input wearable data only, offering more coherent & correct descriptions compared to LLMs like Gemini 2.0 Flash. ✍️✨

(7/8)

10 months ago 0 0 1 0
Post image

SensorLM also demonstrates intriguing capabilities, including interesting scaling behavior over data size, model size, and compute. 📈💡

(6/8)

10 months ago 0 0 1 0