Advertisement Β· 728 Γ— 90

Posts by Ryo Kamoi

Our paper VisOnlyQA has been accepted to
@colmweb.org #COLM2025! See you in Montreal🍁
We find that even recent Vision Language Models struggle with simple questions about geometric properties in images, such as "What is the degree of angle AOD?"🧐
arxiv.org/abs/2412.00947
bsky.app/profile/ryok...

9 months ago 0 0 0 0

Excited to share that Communications of the ACM featured an article that includes an interview with me about LLM self-correction! I mainly discuss self-correction before o1, but I believe it still offers some takeaways.
cacm.acm.org/news/self-co...
arxiv.org/abs/2406.01297

1 year ago 0 0 0 0

VLMEvalKit now supports our VisOnlyQA dataset πŸ”₯πŸ”₯πŸ”₯
github.com/open-compass...

VisOnlyQA reveals that even recent LVLMs like GPT-4o and Gemini 1.5 Pro stumble on simple visual perception questions, e.g., "What is the degree of angle AOD?"🧐
arxiv.org/abs/2412.00947

1 year ago 2 0 0 0
Preview
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information Errors in understanding visual information in images (i.e., visual perception errors) remain a major source of mistakes in Large Vision Language Models (LVLMs). While further analysis is essential, th...

VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information

Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das, Ranran Haoran Zhang, Rui Zhang

Paper: arxiv.org/abs/2412.00947
Data: huggingface.co/collections/...
Code: github.com/psunlpgroup/...

1 year ago 0 0 0 0
Post image

Interestingly, our experiments suggest that stronger language models improve visual perception of LVLMs, even when using the same visual encoders (ViT).

We conclude that we need to improve both the training data and model architecture of LVLMs for better visual perception. [4/n]

1 year ago 0 0 1 0
Post image

We hypothesize that the weak visual perception is due to the lack of training data. To verify this, we make training data for VisOnlyQA, but we observe that the performance after fine-tuning depends on tasks and models, suggesting that training data is not the only problem. [3/n]

1 year ago 0 0 1 0
Post image

VisOnlyQA includes questions about geometric and numerical information on scientific figures.
Recent benchmarks for LVLMs often involve reasoning or knowledge, putting less focus on visual perception. In contrast, VisOnlyQA is designed to evaluate visual perception directly [2/n]

1 year ago 0 0 1 0
Advertisement
Post image Post image

πŸ“’ New preprint! Do LVLMs have strong visual perception capabilities? Not quite yet...
We introduce VisOnlyQA, a new dataset for evaluating the visual perception of LVLMs, but existing LVLMs perform poorly on our dataset. [1/n]
arxiv.org/abs/2412.00947
github.com/psunlpgroup/...

1 year ago 1 0 1 2
Post image

I’m on the academic job market this year! I’m completing my @uwcse.bsky.social @uwnlp.bsky.social Ph.D. (2025), focusing on overcoming LLM limitations like hallucinations, by building new LMs.
My Ph.D. work focuses on Retrieval-Augmented LMs to create more reliable AI systems 🧡

1 year ago 71 17 3 2
Preview
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs Self-correction is an approach to improving responses from large language models (LLMs) by refining the responses using LLMs during inference. Prior work has proposed various self-correction framework...

This reading list is based on our survey paper. Don't forget to check it out as well πŸ˜‰

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs (TACL 2024)
arxiv.org/abs/2406.01297

1 year ago 0 0 0 0
Preview
GitHub - ryokamoi/llm-self-correction-papers: Papers on Self-Correction of LLMs Papers on Self-Correction of LLMs. Contribute to ryokamoi/llm-self-correction-papers development by creating an account on GitHub.

Curious about LLM self-correction? Check out our reading list!
πŸ“š github.com/ryokamoi/llm...

We feature papers & blogs in
* Key self-correction papers
* Negative results in self-correction
* Projects inspired by OpenAI o1

1 year ago 10 2 1 0
Preview
NLP at UT Austin Join the conversation

A starter pack for the NLP and Computational Linguistics researchers at UT Austin!
go.bsky.app/75g9JLT

1 year ago 22 7 0 0
Post image

We at UT Linguistics are hiring for πŸ”₯ 2 faculty positions in Computational Linguistics! Assistant or Associate professors, deadline Dec 1.
UT has a super vibrant comp ling & #nlp community!!

Apply here πŸ‘‰ apply.interfolio.com/158280

1 year ago 12 7 0 1
Preview
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs Abstract. Self-correction is an approach to improving responses from large language models (LLMs) by refining the responses using LLMs during inference. Prior work has proposed various self-correction...

Hello Bluesky. It was great to talk with so many people at
#EMNLP2024!
The paper we presented, a survey paper on self-correction of LLMs, is now on MIT Press!

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs (TACL 2024)
direct.mit.edu/tacl/article...

1 year ago 8 1 0 0
Advertisement