Advertisement · 728 × 90

Posts by Siyuan Song

abstract:
siyuansong.site/assets/pdf/e...

3 weeks ago 1 0 0 0

Just arrived in Boston for #HSP2026!

I'll be presenting my work with
@thomashikaru.bsky.social
on error sensitivity in next-word predictions of humans and LMs — Friday 12:10–2:00pm poster session. Come say hi!

3 weeks ago 8 1 1 0
Preview
Laboratory Coordinator - 138788 Laboratory Coordinator - 138788 | Careers at UC San Diego

I'm hiring a new lab manager for my lab @ UCSD! For more info on the lab, check out our website: lillab.ucsd.edu

Target start date is June 1 (flexible) and application deadline is March 26. Please share with anyone you think might be a good fit!

Apply here: employment.ucsd.edu/laboratory-c...

1 month ago 35 32 0 4
title section of the paper: “Cross-Modal Taxonomic Generalization in (Vision) Language Models” by Tianyang Xu, Marcelo Sandoval-Castañeda, Karen Livescu, Greg Shakhnarovich, Kanishka Misra.

title section of the paper: “Cross-Modal Taxonomic Generalization in (Vision) Language Models” by Tianyang Xu, Marcelo Sandoval-Castañeda, Karen Livescu, Greg Shakhnarovich, Kanishka Misra.

What is the interplay between representations learned from (language) surface forms alone, and those learned from more grounded evidence (e.g.,vision)?

Excited to share new work understanding “Cross-modal taxonomic generalization” in (V)LMs

arxiv.org/abs/2603.07474

1/

1 month ago 34 12 1 1
Post image

Can large language models *introspect*?

In a new paper, @kmahowald.bsky.social and I study the MECHANISM of introspection in big open-source models.

tldr: Models detect internal anomalies through DIRECT ACCESS, but don't know what the anomalies are.

And they love to guess “apple” 🍎

1 month ago 70 15 2 6

*CoNLL is not an AI conference*

1 month ago 1 0 0 0
Title page of our paper: "Bears, all bears, and some bears. Language Constraints on Language Models' Inductive Inferences"

Title page of our paper: "Bears, all bears, and some bears. Language Constraints on Language Models' Inductive Inferences"

“All bears have a property”, “Some bears have a property”, “Bears have a property” are different in terms of how the property is generalized to a specific bear – a great example of how language constrains thought!

This holds for kids, adults, and according to our new work, (V)LMs! 🧵

2 months ago 26 10 1 2
Post image

Our first South by Semantics lecture of the semester at UT Austin is happening next week on January 30th!

I'm excited to hear Dr. Amir Zeldes (Associate Professor at Georgetown University) talk about saliency in discourse and the memorability of salient information for both humans and LLMs.

3 months ago 4 2 0 0
Advertisement
Post image

🧑‍🔬I’m recruiting PhD students in Natural Language Processing @unileipzig.bsky.social Computer Science, together with @scadsai.bsky.social!

Topics include, but aren’t limited to:

🔎Linguistic Interpretability
🌍Multilingual Evaluation
📖Computational Typology

Please share!

#NLProc #NLP

4 months ago 41 25 1 3
Figure 1 showing alignment pipeline using CLIP models on BabyView data.

Figure 1 showing alignment pipeline using CLIP models on BabyView data.

Figure 2: human judgments are correlated with CLIP scores.

Figure 2: human judgments are correlated with CLIP scores.

Can we use VLMs to quantify multimodal alignment in children's experiences? We analyze a large corpus of headcam videos to find out!

New preprint from our BabyView project, led by @alvinwmtan.bsky.social and Jane Yang: arxiv.org/abs/2511.18824

4 months ago 27 5 1 0

Looking forward to #NeurIPS25 this week 🏝️! I'll be presenting at Poster Session 3 (11-2 on Thursday). Feel free to reach out!

4 months ago 10 3 0 1
Post image

I’m excited to present SimpleStories at EurIPS!

Also if anyone at #EurIPS is interested in chatting about LLM data efficiency, interpretability, model inconsistency or other topics feel free to DM me.

Dataset and models: lnkd.in/e_VGWqhP
Code: lnkd.in/eEidmv74
Paper: lnkd.in/eH6jS9uY

4 months ago 18 4 2 0

String probability might be the best tool for assessing LMs' grammatical knowledge, yet it does not directly tell you 'how grammatical' a string is. Here's why and how we should use string probability and minimal pairs:
Excited to see this out - it's my great honor to be part of this amazing team!

5 months ago 3 0 0 0

Oh cool! Excited this LM + construction paper was SAC-Highlighted! Check it out to see how LM-derived measures of statistical affinity separate out constructions with similar words like "I was so happy I saw you" vs "It was so big it fell over".

5 months ago 17 4 0 0

Delighted Sasha's (first year PhD!) work using mech interp to study complex syntax constructions won an Outstanding Paper Award at EMNLP!

Also delighted the ACL community continues to recognize unabashedly linguistic topics like filler-gaps... and the huge potential for LMs to inform such topics!

5 months ago 33 8 1 0
Advertisement

Interested in doing a PhD at the intersection of human and machine cognition? ✨ I'm recruiting students for Fall 2026! ✨

Topics of interest include pragmatics, metacognition, reasoning, & interpretability (in humans and AI).

Check out JHU's mentoring program (due 11/15) for help with your SoP 👇

5 months ago 27 15 0 1
Post image

🧠 New at #NeurIPS2025!
🎵 We're far from the shallow now🎵
TL;DR: We introduce the first "reasoning embedding" and uncover its unique spatio-temporal pattern in the brain.

🔗 arxiv.org/abs/2510.228...

5 months ago 8 4 1 0
Post image

Introducing Global PIQA, a new multilingual benchmark for 100+ languages. This benchmark is the outcome of this year’s MRL shared task, in collaboration with 300+ researchers from 65 countries. This dataset evaluates physical commonsense reasoning in culturally relevant contexts.

5 months ago 22 10 1 5
Post image

Very excited to be going to Chicago for
@agnescallard.bsky.social's famous Night Owls next week! I'll be discussing my essay "ChatGPT and the Meaning of Life". Hope to see you there if you're local!

5 months ago 4 1 1 0
Title of our paper: “Hey, wait a minute: on at-issue sensitivity in Language Models” by Sanghee Kim and Kanishka Misra.

Below: A person says “Sue, Max’s girlfriend, was a tennis champ!”; a second person responds with “What racket does she use?” (which targets at-issue content); a third person replies with “They’re dating?” (which targets not at-issue content)

Title of our paper: “Hey, wait a minute: on at-issue sensitivity in Language Models” by Sanghee Kim and Kanishka Misra. Below: A person says “Sue, Max’s girlfriend, was a tennis champ!”; a second person responds with “What racket does she use?” (which targets at-issue content); a third person replies with “They’re dating?” (which targets not at-issue content)

If I spill the tea—“Did you know Sue, Max’s gf, was a tennis champ?”—but then if you reply “They’re dating?!” I’d be a bit puzzled, since that’s not the main point! Humans can track what’s ‘at issue’ in conversation. How sensitive are LMs to this distinction?

New paper w/ @sangheekim.bsky.social!

6 months ago 35 4 3 3
Post image

I will be recruiting PhD students via Georgetown Linguistics this application cycle! Come join us in the PICoL (pronounced “pickle”) lab. We focus on psycholinguistics and cognitive modeling using LLMs. See the linked flyer for more details: bit.ly/3L3vcyA

6 months ago 27 14 2 0
Title page of the paper: WUGNECTIVES: Novel Entity Inferences of Language Models from Discourse Connectives, with two figures at the bottom

Left: Our figure 1 -- comparing previous work, which usually predicted the connective given the arguments (grounded in the world); our work flips this premise by getting models to use their knowledge of connectives to predict something about the world.

Right: Our main results across 7 types of connective senses. Models are especially bad at Concession connectives.

Title page of the paper: WUGNECTIVES: Novel Entity Inferences of Language Models from Discourse Connectives, with two figures at the bottom Left: Our figure 1 -- comparing previous work, which usually predicted the connective given the arguments (grounded in the world); our work flips this premise by getting models to use their knowledge of connectives to predict something about the world. Right: Our main results across 7 types of connective senses. Models are especially bad at Concession connectives.

"Although I hate leafy vegetables, I prefer daxes to blickets." Can you tell if daxes are leafy vegetables? LM's can't seem to! 📷

We investigate if LMs capture these inferences from connectives when they cannot rely on world knowledge.

New paper w/ Daniel, Will, @jessyjli.bsky.social

6 months ago 32 10 2 2

Gonna keep updating it regularly and have some fun with the resources we’ve got to grow & test Chinese BabyLMs🐣stay tuned!

6 months ago 0 0 0 0

Honored to get the chance to contribute to the Chinese dataset! And had a great time working with all the awesome collaborators!

6 months ago 0 0 1 0
Advertisement

Excited to present this at COLM tomorrow! (Tuesday, 11:00 AM poster session)

6 months ago 3 2 0 0

I will be giving a short talk on this work at the COLM Interplay workshop on Friday (also to appear at EMNLP)!

Will be in Montreal all week and excited to chat about LM interpretability + its interaction with human cognition and ling theory.

6 months ago 8 5 0 0
Preview
Both Direct and Indirect Evidence Contribute to Dative Alternation Preferences in Language Models Language models (LMs) tend to show human-like preferences on a number of syntactic phenomena, but the extent to which these are attributable to direct exposure to the phenomena or more general propert...

Traveling to my first @colmweb.org🍁

Not presenting anything but here are two posters you should visit:

1. @qyao.bsky.social on Controlled rearing for direct and indirect evidence for datives (w/ me, @weissweiler.bsky.social and @kmahowald.bsky.social), W morning

Paper: arxiv.org/abs/2503.20850

6 months ago 13 5 1 0
Post image

On my way to #COLM2025 🍁

Check out jessyli.com/colm2025

QUDsim: Discourse templates in LLM stories arxiv.org/abs/2504.09373

EvalAgent: retrieval-based eval targeting implicit criteria arxiv.org/abs/2504.15219

RoboInstruct: code generation for robotics with simulators arxiv.org/abs/2405.20179

6 months ago 12 4 0 0
Preview
Language Models Fail to Introspect About Their Knowledge of Language There has been recent interest in whether large language models (LLMs) can introspect about their own internal states. Such abilities would make LLMs more interpretable, and also validate the use of s...

I’m at #COLM2025 from Wed with:

@siyuansong.bsky.social Tue am introspection arxiv.org/abs/2503.07513

@qyao.bsky.social Wed am controlled rearing: arxiv.org/abs/2503.20850

@sashaboguraev.bsky.social INTERPLAY ling interp: arxiv.org/abs/2505.16002

I’ll talk at INTERPLAY too. Come say hi!

6 months ago 20 6 1 0

Heading to #COLM2025 to present my first paper w/ @jennhu.bsky.social @kmahowald.bsky.social !

When: Tuesday, 11 AM – 1 PM
Where: Poster #75

Happy to chat about my work and topics in computational linguistics & cogsci!

Also, I'm on the PhD application journey this cycle!

Paper info 👇:

6 months ago 7 3 0 0