Ryo Kamoi (@ryokamoi) Bsky

Our paper VisOnlyQA has been accepted to
@colmweb.org #COLM2025! See you in Montreal🍁
We find that even recent Vision Language Models struggle with simple questions about geometric properties in images, such as "What is the degree of angle AOD?"🧐
arxiv.org/abs/2412.00947
bsky.app/profile/ryok...

9 months ago 0 0 0 0

Excited to share that Communications of the ACM featured an article that includes an interview with me about LLM self-correction! I mainly discuss self-correction before o1, but I believe it still offers some takeaways.
cacm.acm.org/news/self-co...
arxiv.org/abs/2406.01297

1 year ago 0 0 0 0

VLMEvalKit now supports our VisOnlyQA dataset 🔥🔥🔥
github.com/open-compass...

VisOnlyQA reveals that even recent LVLMs like GPT-4o and Gemini 1.5 Pro stumble on simple visual perception questions, e.g., "What is the degree of angle AOD?"🧐
arxiv.org/abs/2412.00947

1 year ago 2 0 0 0

VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information Errors in understanding visual information in images (i.e., visual perception errors) remain a major source of mistakes in Large Vision Language Models (LVLMs). While further analysis is essential, th...

VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information

Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das, Ranran Haoran Zhang, Rui Zhang

Paper: arxiv.org/abs/2412.00947
Data: huggingface.co/collections/...
Code: github.com/psunlpgroup/...

1 year ago 0 0 0 0

Interestingly, our experiments suggest that stronger language models improve visual perception of LVLMs, even when using the same visual encoders (ViT).

We conclude that we need to improve both the training data and model architecture of LVLMs for better visual perception. [4/n]

1 year ago 0 0 1 0

We hypothesize that the weak visual perception is due to the lack of training data. To verify this, we make training data for VisOnlyQA, but we observe that the performance after fine-tuning depends on tasks and models, suggesting that training data is not the only problem. [3/n]

1 year ago 0 0 1 0

VisOnlyQA includes questions about geometric and numerical information on scientific figures.
Recent benchmarks for LVLMs often involve reasoning or knowledge, putting less focus on visual perception. In contrast, VisOnlyQA is designed to evaluate visual perception directly [2/n]

1 year ago 0 0 1 0

📢 New preprint! Do LVLMs have strong visual perception capabilities? Not quite yet...
We introduce VisOnlyQA, a new dataset for evaluating the visual perception of LVLMs, but existing LVLMs perform poorly on our dataset. [1/n]
arxiv.org/abs/2412.00947
github.com/psunlpgroup/...

1 year ago 1 0 1 2

I’m on the academic job market this year! I’m completing my @uwcse.bsky.social @uwnlp.bsky.social Ph.D. (2025), focusing on overcoming LLM limitations like hallucinations, by building new LMs.
My Ph.D. work focuses on Retrieval-Augmented LMs to create more reliable AI systems 🧵

1 year ago 71 17 3 2

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs Self-correction is an approach to improving responses from large language models (LLMs) by refining the responses using LLMs during inference. Prior work has proposed various self-correction framework...

This reading list is based on our survey paper. Don't forget to check it out as well 😉

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs (TACL 2024)
arxiv.org/abs/2406.01297

1 year ago 0 0 0 0

GitHub - ryokamoi/llm-self-correction-papers: Papers on Self-Correction of LLMs Papers on Self-Correction of LLMs. Contribute to ryokamoi/llm-self-correction-papers development by creating an account on GitHub.

Curious about LLM self-correction? Check out our reading list!
📚 github.com/ryokamoi/llm...

We feature papers & blogs in
* Key self-correction papers
* Negative results in self-correction
* Projects inspired by OpenAI o1

1 year ago 10 2 1 0

NLP at UT Austin Join the conversation

A starter pack for the NLP and Computational Linguistics researchers at UT Austin!
go.bsky.app/75g9JLT

1 year ago 22 7 0 0

We at UT Linguistics are hiring for 🔥 2 faculty positions in Computational Linguistics! Assistant or Associate professors, deadline Dec 1.
UT has a super vibrant comp ling & #nlp community!!

Apply here 👉 apply.interfolio.com/158280

1 year ago 12 7 0 1

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs Abstract. Self-correction is an approach to improving responses from large language models (LLMs) by refining the responses using LLMs during inference. Prior work has proposed various self-correction...

Hello Bluesky. It was great to talk with so many people at
#EMNLP2024!
The paper we presented, a survey paper on self-correction of LLMs, is now on MIT Press!

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs (TACL 2024)
direct.mit.edu/tacl/article...

1 year ago 8 1 0 0

Posts by Ryo Kamoi