Advertisement · 728 × 90
#
Hashtag
#VisionLanguageModels
Advertisement · 728 × 90

💬 We thank Prof. Kementchedjhieva for the insightful talk and the discussion with UKP members on multimodal modeling and the future of vision-language systems.

#UKPLab #MultimodalAI #VisionLanguageModels #NLP #GuestTalk #NLProc #MBZUAI @tuda.bsky.social @cs-tudarmstadt.bsky.social

0 1 0 0
Preview
LEGS Trains 3.5x Faster Than LERF in Large-Scale Indoor Mapping

LEGS embeds language into 3D Gaussian splats, training 3.5x faster than LERF while improving pose fidelity in large-scale indoor scenes. #visionlanguagemodels

0 0 0 0
Preview
Bundle Adjustment Makes or Breaks 3D Gaussian Splats, Study Finds

LEGS matches LERF in object recall while cutting training time to 12 minutes, showing how bundle adjustment boosts 3D Gaussian splat quality. #visionlanguagemodels

0 0 0 0
Preview
Robots Learn to “See” With Language in Real Time Using 3D Gaussian Splatting

A real-time robotic system that builds 3D indoor maps and localizes objects using open-vocabulary language queries and Gaussian Splatting. #visionlanguagemodels

0 0 0 0
Preview
New System Combines SLAM and Language Models for Online 3D Scene Mapping

A new system merges 3D Gaussian Splatting and language models to enable real-time semantic mapping and object localization for robots. #visionlanguagemodels

0 0 0 0
Preview
Researchers Develop a Real-Time 3D Mapping System That Helps Robots Understand Natural Language

A new system fuses language models with 3D Gaussian Splatting to help robots build real-time, semantic maps 3.5x faster than existing methods. #visionlanguagemodels

0 0 0 0
Preview
How Multi-Stage Reasoning Helps AI Understand What Cities Mean

How a new vision-language AI uses multi-stage reasoning to identify schools, parks, and hospitals—going beyond pixels to understand cities. #visionlanguagemodels

0 0 0 0
Preview
MLLM Adapters: Review of VPGs and Multimodal Fusion

Reviews state-of-the-art MLLMs. Highlights the challenge of expanding current models beyond the simple one-to-one image text relationship. #visionlanguagemodels

0 0 0 0

I will be at EMNLP next week presenting this work on November the 7th! Reach out to me for any questions :))

Work done with my advisor, Mirella Lapata!

Preprint: arxiv.org/pdf/2505.14627
#EMNLP2025 #multimodallearning #scalableoversight #visionlanguagemodels #nlproc

0 0 0 0
Preview
New Dataset PerSense-D Enables Model-Agnostic Dense Object Segmentation

PerSense-D is a new benchmark dataset for personalized dense image segmentation, advancing AI accuracy in crowded visual environments. #visionlanguagemodels

0 0 0 0
Preview
PerSense Delivers Expert-Level Instance Recognition Without Any Training

Adaptive prompts, density maps, and VLMs are used in PerSense's training-free one-shot segmentation framework for dense picture interpretation. #visionlanguagemodels

0 0 0 0
Preview
PerSense: A One-Shot Framework for Personalized Segmentation in Dense Images

PerSense is a model-aware, training-free system for one-shot tailored instance division in dense images based on density and vision-language cues. #visionlanguagemodels

0 0 0 0
Reason‑RFT improves visual reasoning in vision‑language models

Reason‑RFT improves visual reasoning in vision‑language models

Reason-RFT improves visual reasoning in vision-language models, according to the announcement. Read more: getnews.me/reason-rft-improves-visu... #reasonrft #visionlanguagemodels #visualreasoning

0 0 0 0
MetaSpatial improves 3D spatial reasoning in vision-language models

MetaSpatial improves 3D spatial reasoning in vision-language models

MetaSpatial says it improves 3D spatial reasoning in vision-language models. Read more: getnews.me/metaspatial-improves-3d-... #metaspatial #3dspatial #visionlanguagemodels

0 0 0 0
Vision-Language Models Struggle with Compositional Counting

Vision-Language Models Struggle with Compositional Counting

VLMCountBench shows vision‑language models count objects when only one shape type (triangles, circles or squares) appears, but accuracy drops on scenes with multiple shapes. Read more: getnews.me/vision-language-models-s... #visionlanguagemodels #counting

0 0 0 0
Vision‑Language Models Linked to Action Expert for Robot Planning

Vision‑Language Models Linked to Action Expert for Robot Planning

A new framework pairs vision‑language models with an action expert that refines sparse 3‑D waypoints into collision‑free motion plans, trained on synthetic and real point‑cloud data. getnews.me/vision-language-models-l... #visionlanguagemodels #robotics

0 0 0 0
DepthLM Achieves Accurate Metric Depth with Vision‑Language Models

DepthLM Achieves Accurate Metric Depth with Vision‑Language Models

DepthLM equips vision-language models with metric depth prediction, matching the accuracy of dedicated depth estimators, per the paper submitted on 1 Oct 2025. Read more: getnews.me/depthlm-achieves-accurat... #depthlm #visionlanguagemodels

0 0 0 0
CoFFT Boosts Vision Language Models with Iterative Focused Reasoning

CoFFT Boosts Vision Language Models with Iterative Focused Reasoning

CoFFT, a training-free technique, lifts Vision Language Model accuracy by 3.1%–5.8% and debuted on 1 Oct 2025, iteratively sharpening visual focus during inference. getnews.me/cofft-boosts-vision-lang... #visionlanguagemodels #cofft

0 0 0 0
Vision-Language Models Restore Spatial Awareness with New Diagnostic Tools

Vision-Language Models Restore Spatial Awareness with New Diagnostic Tools

Three new diagnostics—PSI, CMB, RoPE probe—show VLMs favor visual tokens; reducing visual token norms raised PSI and improved spatial reasoning. Read more: getnews.me/vision-language-models-r... #visionlanguagemodels #spatialreasoning

0 0 0 0
Explanation-Driven Counterfactual Testing Boosts Faithfulness of Vision-Language Model Explanations

Explanation-Driven Counterfactual Testing Boosts Faithfulness of Vision-Language Model Explanations

EDCT audits VLM explanation faithfulness on 120 OK‑VQA examples, showing many explanations are plausible but not causally linked to answers. Read more: getnews.me/explanation-driven-count... #edct #visionlanguagemodels

0 0 0 0
Capability-Attributed Data Curation Improves Vision-Language Models

Capability-Attributed Data Curation Improves Vision-Language Models

CADC reduces required training data to about 5% of the original set while still outperforming full-data models on multimodal benchmarks, the authors report. Read more: getnews.me/capability-attributed-da... #visionlanguagemodels #capabilitycuration

0 0 0 0
Preview
Future of AD Security: Addressing Limitations and Ethical Concerns in Typographic Attack Research

This paper summarizes a comprehensive framework for typographic attacks, proving their effectiveness and transferability against Vision-LLMs like LLaVA #visionlanguagemodels

0 0 0 0
Preview
Empirical Study: Evaluating Typographic Attack Effectiveness Against Vision-LLMs in AD Systems

This article presents an empirical study on the effectiveness and transferability of typographic attacks against major Vision-LLMs using AD-specific datasets. #visionlanguagemodels

0 0 0 0
Preview
Foreground vs. Background: Analyzing Typographic Attack Placement in Autonomous Driving Systems

This article explores the physical realization of typographic attacks, categorizing their deployment into background and foreground elements #visionlanguagemodels

0 0 0 0
GSM8K-V Shows Vision Language Models Lag on Visual Math Problems

GSM8K-V Shows Vision Language Models Lag on Visual Math Problems

GSM8K‑V adds visual format to 1,319 grade‑school math problems. Gemini‑2.5‑Pro scores 95.22% on text but only 46.93% on the visual version, showing a gap for VLMs. getnews.me/gsm8k-v-shows-vision-lan... #gsm8kv #visionlanguagemodels

0 0 0 0
Preview
Exploiting Vision-LLM Vulnerability: Enhancing Typographic Attacks with Instructional Directives

This article proposes a linguistic augmentation scheme for typographic attacks using explicit instructional directives. #visionlanguagemodels

0 0 0 0
Preview
Methodology for Adversarial Attack Generation: Using Directives to Mislead Vision-LLMs

This article details the multi-step typographic attack pipeline, including Attack Auto-Generation and Attack Augmentation. #visionlanguagemodels

0 0 0 0
Disentangling Text for Better Language‑Based Object Detection

Disentangling Text for Better Language‑Based Object Detection

TaSe splits queries into object, attribute and relation parts, then hierarchically recombines them, delivering a significant 24% boost on the OmniLabel benchmark. Read more: getnews.me/disentangling-text-for-b... #visionlanguagemodels #objectdetection

0 0 0 0
Preview
The Dual-Edged Sword of Vision-LLMs in AD: Reasoning Capabilities vs. Attack Vulnerabilities

This article analyzes the critical safety trade-off of integrating Vision-LLMs into autonomous driving (AD) systems. #visionlanguagemodels

0 0 0 0
Vision-Language Models' Ability to Name Colors Evaluated

Vision-Language Models' Ability to Name Colors Evaluated

A study tested five vision‑language models on 957 color samples and found high accuracy for prototypical colors but lower performance on non‑prototypical shades across nine languages. getnews.me/vision-language-models-a... #visionlanguagemodels #colornaming

0 0 0 0