Diyi Yang (@diyiyang) Bsky

We are getting closer to have agents operating in the real physical world. However, can we trust frontier models to make embodied decisions 🎮 aligned with human norms 👩‍⚖️ ?

With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!

1 year ago 22 9 1 1

We (w/ @diyiyang.bsky.social, @zhuhao.me, & Bodhisattwa Prasad Majumder) are excited to present our #NAACL25 tutorial on Social Intelligence in the Age of LLMs!
It will highlight long-standing and emerging challenges of AI interacting w humans, society & the world.
⏰ May 3, 2:00pm-5:30pm Room Pecos

11 months ago 14 6 0 0

LM agents today primarily aim to automate tasks. Can we turn them into collaborative teammates? 🤖➕👤

Introducing Collaborative Gym (Co-Gym), a framework for enabling & evaluating human-agent collaboration! I now get used to agents proactively seeking confirmations or my deep thinking.(🧵 with video)

1 year ago 22 10 1 1

Talk Arena Interactive evaluation for audio models

My first bluesky post will be for my first project as a postdoc at Stanford.

Talk Arena is our first step towards building audio LMs into interactive agents. Try it out and let me know what you think. talkarena.org

1 year ago 18 4 2 1

Talk Arena Interactive evaluation for audio models

Want to add your model to the arena? Have an idea for a new feature for Talk Arena? We are open to collaboration in many forms!

Co-led with @ellaminzhili.bsky.social in collaboration with @michaelryan207.bsky.social Kunat Pipatanakul Potsawee Manakul @zhuhao.me and @diyiyang.bsky.social (5/5)

1 year ago 3 1 0 0

Talk Arena: Interactive Evaluation of Large Audio Models

With an increasing number of Large *Audio* Models 🔊, which one do users like the most?

Introducing talkarena.org — an open platform where users speak to LAMs and receive text responses. Through open interaction, we focus on rankings based on user preferences rather than static benchmarks.
🧵 (1/5)

1 year ago 30 8 3 3

PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action As language models (LMs) are widely utilized in personalized communication scenarios (e.g., sending emails, writing social media posts) and endowed with a certain level of agency, ensuring they act in...

Check out our paper, code, data to learn more!

Paper: arxiv.org/abs/2409.00138
Website: salt-nlp.github.io/PrivacyLens/

1 year ago 2 2 1 0

Excited to present our PrivacyLens paper at #NuerIPS next week! We explore LM agent privacy risks when deployed as personal assistants. (Details in thread)

I am working on developing LM agents as collaborative research partners, learning aids, personal assistants, and more. Let's connect and chat!!

1 year ago 7 2 2 0

Meet Tülu 3, a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms.
We invented new methods for fine-tuning language models with RL and built upon best practices to scale synthetic instruction and preference data.
Demo, GitHub, paper, and models 👇

1 year ago 111 31 2 7

Missed some – or all – of our papers at #EMNLP2024?

It's not too late to catch up using this handy list from the Stanford AI Lab blog:

ai.stanford.edu/blog/emnlp-2...

1 year ago 24 4 0 0

Histogram peaked at 3 minutes and 2 weeks since sent

When I will respond to your email

1 year ago 2060 349 38 86

so far, every Thanksgiving week is writing letters week for me 🤣

1 year ago 2 0 1 0

🌶️(?) take: Agents are somehow hot right because people realized that LLM output can be interpreted as a DSL which directs side effects in the world (e.g. tool calls) rather than just returning text in a chat/autocomplete sense. What are the open challenges? A 🧵... [1/11]

1 year ago 165 31 9 7

I did an unscientific, uncontrolled experiment for #EMNLP2024—details in 🧵👇. I posted my conference & workshop papers to 5 socials. Clear results: Mastodon is near dead, Threads may have users but not my people, not giving up on X/Twitter yet, but Bluesky is worth investing in.

1 year ago 87 10 2 0

EMNLP 2024 Tutorial: Language Agents: Foundations, Prospects, and Risks Deformable Neural Radiance Fields creates free-viewpoint portraits (nerfies) from casually captured videos.

Had a great time doing the language agent tutorial (language-agent-tutorial.github.io) with Yu Su, Shunyu Yao and Tao Yu 😀 #EMNLP2024

Check out our slides here: tinyurl.com/language-age...

1 year ago 33 5 0 0

Header of poster "Turn your LLM into a Speech LLM in 6 hours without any new data" Find a machine readable version of this poster at https://diva-audio.github.io/

I'll be at the Google Theory and Practice of Foundation Models Workshop today and tomorrow! FOMO for EMNLP, but excited to chat more casually at a smaller non-archival workshop 😅

I am presenting at the Lightning Talks tomorrow at 1:30 PM on our Distilled Voice Assistant model if you're around!

1 year ago 8 1 0 0

CS 4644 / 7643: Deep Learning - LLM Guest Lecture - Fall 2024 Training Large Language Models CS 4644 / 7643: Deep Learning William Held School of Interactive Computing Georgia Institute of Technology

Every semester, I drop into Georgia Tech's Deep Learning course to do a speed-through LLM lecture! I keep updating things to balance "history" and recent progress.

Slides for this semester are here for folks who are teaching courses on NLP/DL/LLMs in the near future: docs.google.com/presentation...

1 year ago 10 2 0 0

I wanted to contribute to "Starter Pack Season" with one for Stanford NLP+HCI: go.bsky.app/VZBhuJ5

Here are some other great starter packs:

- CSS: go.bsky.app/GoEyD7d + go.bsky.app/CYmRvcK
- NLP: go.bsky.app/SngwGeS + go.bsky.app/JgneRQk
- HCI: go.bsky.app/p3TLwt
- Women in AI: go.bsky.app/LaGDpqg

1 year ago 25 10 2 2

Posts by Diyi Yang