Advertisement ยท 728 ร— 90

Posts by Jiaang Li

๐Ÿ™ Huge thanks to all collaborators @yfyuan01.bsky.social, @wenyan62.bsky.social, @aliannejadi.bsky.social, @danielhers.bsky.social, Anders Sรธgaard, Ivan Vuliฤ‡, Wenxuan Zhang, Paul Liang, Yang Deng, @serge.belongie.com

5 days ago 2 1 0 0

๐Ÿ“„ Paper: arxiv.org/abs/2505.14462
๐ŸŒ Project: jiaangli.github.io/ravenea/
๐Ÿ’ป Code: github.com/yfyuan01/RAV...
๐Ÿค— Data: huggingface.co/datasets/jaa...

5 days ago 1 0 1 0
Post image

๐ŸŽ‰ Excited to share our work "๐—ฅ๐—”๐—ฉ๐—˜๐—ก๐—˜๐—”: ๐—” ๐—•๐—ฒ๐—ป๐—ฐ๐—ต๐—บ๐—ฎ๐—ฟ๐—ธ ๐—ณ๐—ผ๐—ฟ ๐— ๐˜‚๐—น๐˜๐—ถ๐—บ๐—ผ๐—ฑ๐—ฎ๐—น ๐—ฅ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฒ๐˜ƒ๐—ฎ๐—น-๐—”๐˜‚๐—ด๐—บ๐—ฒ๐—ป๐˜๐—ฒ๐—ฑ ๐—ฉ๐—ถ๐˜€๐˜‚๐—ฎ๐—น ๐—–๐˜‚๐—น๐˜๐˜‚๐—ฟ๐—ฒ ๐—จ๐—ป๐—ฑ๐—ฒ๐—ฟ๐˜€๐˜๐—ฎ๐—ป๐—ฑ๐—ถ๐—ป๐—ด", accepted at #ICLR2026!

๐Ÿ‡ง๐Ÿ‡ท I'll be attending ICLR in person โ€” would love to connect and chat there! ๐Ÿค
๐Ÿ—“๏ธ Sat, Apr 25, 2026, 10:30 AM โ€“ 1:00 PM GMT-03
๐Ÿ“ Pavilion 4 P4-# 3618

5 days ago 2 1 1 0
Post image

Feeling overwhelmed by all the recent developments in video understanding? What used to require dozens of modular computational workflows involving SLAM, feature tracking, optical flow, camera calibration, multiview geometric constraints, and resnet backbones is now...

(1/3)

1 month ago 49 7 1 1

Feel free to reach out and chat with Xinyi on July 18th in Vancouver at the #ICML

9 months ago 0 0 0 0
Preview
NeurIPS participation in Europe We seek to understand if there is interest in being able to attend NeurIPS in Europe, i.e. without travelling to San Diego, US. In the following, assume that it is possible to present accepted papers ...

Would you present your next NeurIPS paper in Europe instead of traveling to San Diego (US) if this was an option? Sรธren Hauberg (DTU) and I would love to hear the answer through this poll: (1/6)

1 year ago 280 161 6 12
Post image

Check out our new preprint ๐“๐ž๐ง๐ฌ๐จ๐ซ๐†๐‘๐š๐ƒ.
We use a robust decomposition of the gradient tensors into low-rank + sparse parts to reduce optimizer memory for Neural Operators by up to ๐Ÿ•๐Ÿ“%, while matching the performance of Adam, even on turbulent Navierโ€“Stokes (Re 10e5).

10 months ago 30 7 2 2

PhD student, Jiaang Li and his collaborators, with insights into cultural understanding of vision-language models ๐Ÿ‘‡

10 months ago 1 1 0 0
Paper title "Cultural Evaluations of Vision-Language Models
Have a Lot to Learn from Cultural Theory"

Paper title "Cultural Evaluations of Vision-Language Models Have a Lot to Learn from Cultural Theory"

I am excited to announce our latest work ๐ŸŽ‰ "Cultural Evaluations of Vision-Language Models Have a Lot to Learn from Cultural Theory". We review recent works on culture in VLMs and argue for deeper grounding in cultural theory to enable more inclusive evaluations.

Paper ๐Ÿ”—: arxiv.org/pdf/2505.22793

10 months ago 57 18 3 5
Advertisement

Great collaboration with @yfyuan01.bsky.social @wenyan62.bsky.social @aliannejadi.bsky.social @danielhers.bsky.social , Anders Sรธgaard, Ivan Vuliฤ‡, Wenxuan Zhang, Paul Liang, Yang Deng, @serge.belongie.com

10 months ago 2 0 0 0
Preview
jaagli/ravenea ยท Datasets at Hugging Face Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

๐Ÿ”—More here:
Project Page: jiaangli.github.io/RAVENEA/
Code: github.com/yfyuan01/RAV...
Dataset: huggingface.co/datasets/jaa...

10 months ago 1 0 1 0
Post image

๐Ÿ“ŠOur experiments demonstrate that even lightweight VLMs, when augmented with culturally relevant retrievals, outperform their non-augmented counterparts and even surpass the next larger model tier, achieving at least a 3.2% improvement in cVQA and 6.2% in cIC.

10 months ago 0 0 1 0
Post image

๐Ÿ› Culture-Aware Contrastive Learning

We propose Culture-aware Contrastive (CAC) Learning, a supervised learning framework compatible with both CLIP and SigLIP architectures. Fine-tuning with CAC can help models better capture culturally significant content.

10 months ago 1 0 1 0
Post image

๐Ÿ“š Dataset Construction
RAVENEA integrates 1,800+ images, 2,000+ culture-related questions, 500+ human captions, and 10,000+ human-ranked Wikipedia documents to support two key tasks:

๐ŸŽฏCulture-focused Visual Question Answering (cVQA)
๐Ÿ“Culture-informed Image Captioning (cIC)

10 months ago 1 0 1 0
Post image

๐Ÿš€New Preprint๐Ÿš€
Can Multimodal Retrieval Enhance Cultural Awareness in Vision-Language Models?

Excited to introduce RAVENEA, a new benchmark aimed at evaluating cultural understanding in VLMs through RAG.
arxiv.org/abs/2505.14462

More details:๐Ÿ‘‡

10 months ago 17 7 1 2

Super cool! Incidentally, in our previous project, we also found that linear alignment between embedding spaces from two modalities is viable โ€” and the alignment improves as LLMs scale.
bsky.app/profile/jiaa...

10 months ago 9 0 0 0
Preview
Revisiting the Othello World Model Hypothesis Li et al. (2023) used the Othello board game as a test case for the ability of GPT-2 to induce world models, and were followed up by Nanda et al. (2023b). We briefly discuss the original experiments, ...

I wonโ€™t be attending #ICLR in person this year๐Ÿ˜ข. But feel free to check our paper โ€˜Revisiting the Othello World Model Hypothesisโ€™ with Anders Sรธgaard, accepted at ICLR world models workshop!
Paper link arxiv.org/abs/2503.04421

1 year ago 1 2 0 0
Post image

Thrilled to announce "Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation" is accepted as a Spotlight (5%) at #ICLR2025!

Our model MM-FSS leverages 3D, 2D, & text modalities for robust few-shot 3D segmentationโ€”all without extra labeling cost. ๐Ÿคฉ

arxiv.org/pdf/2410.22489

More details๐Ÿ‘‡

1 year ago 25 7 1 0
Advertisement
Post image

Forget just thinking in words.

๐Ÿ””Our New Preprint:
๐Ÿš€ New Era of Multimodal Reasoning๐Ÿšจ
๐Ÿ” Imagine While Reasoning in Space with MVoT

Multimodal Visualization-of-Thought (MVoT) revolutionizes reasoning by generating visual "thoughts" that transform how AI thinks, reasons, and explains itself.

1 year ago 6 1 1 0

FGVC12 Workshop is coming to #CVPR 2025 in Nashville!

Are you working on fine-grained visual problems?
This year we have two peer-reviewed paper tracks:
i) 8-page CVPR Workshop proceedings
ii) 4-page non-archival extended abstracts
CALL FOR PAPERS: sites.google.com/view/fgvc12/...

1 year ago 10 3 0 0
VidenSkaber | Min AI forstรฅr mig ikke - professor Serge Belongie
VidenSkaber | Min AI forstรฅr mig ikke - professor Serge Belongie YouTube video by Videnskabernes Selskab

Hereโ€™s a short film produced by the Danish Royal Academy of Sciences, showcasing the WineSensed ๐Ÿท project of รžรณranna Bender et al. thoranna.github.io/learning_to_...

1 year ago 17 3 0 0
Post image

From San Diego to New York to Copenhagen, wishing you Happy Holidays!๐ŸŽ„

1 year ago 39 4 0 0

With @neuripsconf.bsky.social right around the corner, weโ€™re excited to be presenting our work soon! Hereโ€™s an overview

(1/5)

1 year ago 16 6 1 2
Preview
Belongie Lab Join the conversation

Hereโ€™s a starter pack with members of our lab that have joined Bluesky

1 year ago 13 4 0 0
Preview
a panda bear is rolling around in the grass in a zoo enclosure . Alt: a panda bear is rolling around in the grass in a zoo enclosure .

No one can explain stochastic gradient descent better than this panda.

1 year ago 216 32 10 6

๐Ÿ™‹โ€โ™‚๏ธ

1 year ago 0 0 0 0
Advertisement
Preview
GitHub - jiaangli/VLCA: Do Vision and Language Models Share Concepts? A Vector Space Alignment Study Do Vision and Language Models Share Concepts? A Vector Space Alignment Study - jiaangli/VLCA

Great collaboration with @constanzafierro.bsky.social , @YovaKem_v2, and Anders Sรธgaard!

๐Ÿ‘จโ€๐Ÿ’ป github.com/jiaangli/VLCA
๐Ÿ“ƒ direct.mit.edu/tacl/article...

1 year ago 0 0 0 0
Post image

๐Ÿš€Take away:

1. Representation spaces of LMs and VMs grow more partially similar with model size.
2. Lower frequency, polysemy, dispersion can be easier to align.
3. Shared concepts between LMs and VMs might extend beyond nouns.

๐Ÿงต(7/8)
#NLP #NLProc

1 year ago 0 0 1 0
Post image

๐ŸŒฑWe then discuss the implications of our finding:
- the LM understanding debate
- the study of emergent properties
- philosophy

๐Ÿงต(6/8)

1 year ago 0 0 1 0
Post image Post image Post image

๐Ÿ”We also measure the generalization of the mapping to other POS, and explore the impact of different size of the training data. ๐Ÿ‘€To investigate the effects of incorporating text signals during vision pretraining, we compare pure vision models against selected CLIP vision encoders.

๐Ÿงต(5/8)

1 year ago 0 0 1 0