Advertisement · 728 × 90

Posts by arXiv cs.CV Computer Vision and Pattern Recognition

Heming Zhu, Guoxing Sun, Marc Habermann: MUA: Mobile Ultra-detailed Animatable Avatars https://arxiv.org/abs/2604.18583 https://arxiv.org/pdf/2604.18583 https://arxiv.org/html/2604.18583

8 hours ago 0 0 0 0

Aditya Arora, Akshita Gupta, Pau Rodriguez, Marcus Rohrbach: ReCap: Lightweight Referential Grounding for Coherent Story Visualization https://arxiv.org/abs/2604.18575 https://arxiv.org/pdf/2604.18575 https://arxiv.org/html/2604.18575

8 hours ago 0 0 0 0

Savya Khosla, Sethuraman T V, Aryan Chadha, Alex Schwing, Derek Hoiem: T-REN: Learning Text-Aligned Region Tokens Improves Dense Vision-Language Alignment and Scalability https://arxiv.org/abs/2604.18573 https://arxiv.org/pdf/2604.18573 https://arxiv.org/html/2604.18573

8 hours ago 0 0 0 0

A. Sophia Koepke, Daniil Zverev, Shiry Ginosar, Alexei A. Efros: Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale https://arxiv.org/abs/2604.18572 https://arxiv.org/pdf/2604.18572 https://arxiv.org/html/2604.18572

8 hours ago 0 0 0 0

Haoyu Wu, Jiwen Yu, Yingtian Zou, Xihui Liu: MultiWorld: Scalable Multi-Agent Multi-View Video World Models https://arxiv.org/abs/2604.18564 https://arxiv.org/pdf/2604.18564 https://arxiv.org/html/2604.18564

8 hours ago 0 0 0 0

Rui Qian, Chuanhang Deng, Qiang Huang, Jian Xiong, Mingxuan Li, Yingbo Zhou, Wei Zhai, Jintao Chen, Dejing Dou: AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation https://arxiv.org/abs/2604.18562 https://arxiv.org/pdf/2604.18562 https://arxiv.org/html/2604.18562

8 hours ago 0 0 0 0

Yao, Ma, Zhang, Sun, Xing, Yang, Guo, Liu, Tang: SynAgent: Generalizable Cooperative Humanoid Manipulation via Solo-to-Cooperative Agent Synergy https://arxiv.org/abs/2604.18557 https://arxiv.org/pdf/2604.18557 https://arxiv.org/html/2604.18557

8 hours ago 0 0 0 0

Qihang Fan, Huaibo Huang, Mingrui Chen, Hongmin Liu, Ran He: Advancing Vision Transformer with Enhanced Spatial Priors https://arxiv.org/abs/2604.18549 https://arxiv.org/pdf/2604.18549 https://arxiv.org/html/2604.18549

8 hours ago 0 0 0 0

Fardin, Alam, Fahim, Mahfuz: MetaCloak-JPEG: JPEG-Robust Adversarial Perturbation for Preventing Unauthorized DreamBooth-Based Deepfake Generation https://arxiv.org/abs/2604.18537 https://arxiv.org/pdf/2604.18537 https://arxiv.org/html/2604.18537

8 hours ago 0 0 0 0
Advertisement

Wang, Deng, Pan, Liu, Wang, Zhang, Qi, Wang: UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models https://arxiv.org/abs/2604.18518 https://arxiv.org/pdf/2604.18518 https://arxiv.org/html/2604.18518

8 hours ago 1 0 0 0

Nitish Shukla, Surgan Jandial, Arun Ross: S2H-DPO: Hardness-Aware Preference Optimization for Vision-Language Models https://arxiv.org/abs/2604.18512 https://arxiv.org/pdf/2604.18512 https://arxiv.org/html/2604.18512

8 hours ago 0 0 0 0

Jinghui Lu, et al.: OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation https://arxiv.org/abs/2604.18486 https://arxiv.org/pdf/2604.18486 https://arxiv.org/html/2604.18486

8 hours ago 0 0 0 0

Kangan Qian, et al.: XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments https://arxiv.org/abs/2604.18484 https://arxiv.org/pdf/2604.18484 https://arxiv.org/html/2604.18484

8 hours ago 0 0 0 0

Hao Vo, Khoa Vo, Thinh Phan, Ngo Xuan Cuong, Gianfranco Doretto, Hien Nguyen, Anh Nguyen, Ngan Le: SemLT3D: Semantic-Guided Expert Distillation for Camera-only Long-Tailed 3D Object Detection https://arxiv.org/abs/2604.18476 https://arxiv.org/pdf/2604.18476 https://arxiv.org/html/2604.18476

8 hours ago 0 0 0 0

Cao, Ren, Zhang, Seo, Huang, Solanki, Zhang, Guo, Turki, Li, Zhu, Zhang, Gojcic, Fidler, Yin: Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation https://arxiv.org/abs/2604.18468 https://arxiv.org/pdf/2604.18468 https://arxiv.org/html/2604.18468

8 hours ago 0 0 0 0

Zhang, Yang, Han, Hao, Zhuge, Li, Zhao, Li, Chang: Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions https://arxiv.org/abs/2604.18459 https://arxiv.org/pdf/2604.18459 https://arxiv.org/html/2604.18459

8 hours ago 0 0 0 0

Clayton Fields, Casey Kennington: ESsEN: Training Compact Discriminative Vision-Language Transformers in a Low-Resource Setting https://arxiv.org/abs/2604.18452 https://arxiv.org/pdf/2604.18452 https://arxiv.org/html/2604.18452

8 hours ago 0 0 0 0
Advertisement

Yakoub Bazi, Mohamad M. Al Rahhal, Mansour Zuair, Faroun Mohamed: Revisiting Change VQA in Remote Sensing with Structured and Native Multimodal Qwen Models https://arxiv.org/abs/2604.18429 https://arxiv.org/pdf/2604.18429 https://arxiv.org/html/2604.18429

8 hours ago 0 0 0 0

Jiyao Liu, et al.: MedProbeBench: Systematic Benchmarking at Deep Evidence Integration for Expert-level Medical Guideline https://arxiv.org/abs/2604.18418 https://arxiv.org/pdf/2604.18418 https://arxiv.org/html/2604.18418

8 hours ago 0 0 0 0

Boan Zhang, Wen Li, Guanhua Yu, Xiyang Liu, Wenchao Chen, Long Tian: One-Step Diffusion with Inverse Residual Fields for Unsupervised Industrial Anomaly Detection https://arxiv.org/abs/2604.18393 https://arxiv.org/pdf/2604.18393 https://arxiv.org/html/2604.18393

8 hours ago 0 0 0 0

Chao Yuan, Yujian Zhao, Haoxuan Xu, Guanglin Niu: Towards Robust Text-to-Image Person Retrieval: Multi-View Reformulation for Semantic Compensation https://arxiv.org/abs/2604.18376 https://arxiv.org/pdf/2604.18376 https://arxiv.org/html/2604.18376

8 hours ago 0 0 0 0

Zeeshan Nisar, Friedrich Feuerhake, Thomas Lampert: DSA-CycleGAN: A Domain Shift Aware CycleGAN for Robust Multi-Stain Glomeruli Segmentation https://arxiv.org/abs/2604.18368 https://arxiv.org/pdf/2604.18368 https://arxiv.org/html/2604.18368

8 hours ago 0 0 0 0

Iva Sovi\'c, Ivan Martinovi\'c, Marin Or\v{s}i\'c: EAST: Early Action Prediction Sampling Strategy with Token Masking https://arxiv.org/abs/2604.18367 https://arxiv.org/pdf/2604.18367 https://arxiv.org/html/2604.18367

8 hours ago 0 0 0 0

Zixuan Shen, Zhihua Xia, Kaikai Gan, Peipeng Yu: LBFTI: Layer-Based Facial Template Inversion for Identity-Preserving Fine-Grained Face Reconstruction https://arxiv.org/abs/2604.18358 https://arxiv.org/pdf/2604.18358 https://arxiv.org/html/2604.18358

8 hours ago 0 0 0 0
Advertisement

Haoyue Tan, Shengnan Wang, Yulin Qiao, Juncheng Zhang, Youhui Bai, Ping Gong, Zewen Jin, Cheng Li: AdaCluster: Adaptive Query-Key Clustering for Sparse Attention in Video Generation https://arxiv.org/abs/2604.18348 https://arxiv.org/pdf/2604.18348 https://arxiv.org/html/2604.18348

8 hours ago 0 0 0 0

Lei Zhu, Xing Cai, Yingjie Chen, Yiheng Li, Binxin Yang, Hao Liu, Jie Chen, Chen Li, Jing LYu: OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation https://arxiv.org/abs/2604.18326 https://arxiv.org/pdf/2604.18326 https://arxiv.org/html/2604.18326

8 hours ago 0 0 0 0

Yongrui Heng, Chaoya Jiang, Han Yang, Shikun Zhang, Wei Ye: EVE: Verifiable Self-Evolution of MLLMs via Executable Visual Transformations https://arxiv.org/abs/2604.18320 https://arxiv.org/pdf/2604.18320 https://arxiv.org/html/2604.18320

8 hours ago 0 0 0 0

Sa Zhu, Wanqian Zhang, Lin Wang, Jinchao Zhang, Cong Wang, Bo Li: Denoise and Align: Diffusion-Driven Foreground Knowledge Prompting for Open-Vocabulary Temporal Action Detection https://arxiv.org/abs/2604.18313 https://arxiv.org/pdf/2604.18313 https://arxiv.org/html/2604.18313

8 hours ago 0 0 0 0

Qiugang Zhan, Anning Jiang, Ran Tao, Ao Ma, Xiangyu Zhang, Xiurui Xie, Guisong Liu: Spike-NVPT: Learning Robust Visual Prompts via Bio-Inspired Temporal Filtering and Discretization https://arxiv.org/abs/2604.18284 https://arxiv.org/pdf/2604.18284 https://arxiv.org/html/2604.18284

8 hours ago 0 0 0 0

Zepeng Sun, Naichuan Zheng, Hailun Xia, Junjie Wu, Liwei Bao, Xiaotai Zhang: LiquidTAD: An Efficient Method for Temporal Action Detection via Liquid Neural Dynamics https://arxiv.org/abs/2604.18274 https://arxiv.org/pdf/2604.18274 https://arxiv.org/html/2604.18274

8 hours ago 0 0 0 0