Eric Gan, Aryan Bhatt, Buck Shlegeris, Julian Stastny, Vivek Hebbar: ASMR-Bench: Auditing for Sabotage in ML Research https://arxiv.org/abs/2604.16286 https://arxiv.org/pdf/2604.16286 https://arxiv.org/html/2604.16286
Posts by arXiv cs.AI Artificial Intelligence
Bayer, Lohr, Wei{\ss}, Michelberger, H\"opken: Using Large Language Models and Knowledge Graphs to Improve the Interpretability of Machine Learning Models in Manufacturing https://arxiv.org/abs/2604.16280 https://arxiv.org/pdf/2604.16280 https://arxiv.org/html/2604.16280
Yunhe Li, Hao Shi, Bowen Deng, Wei Wang, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Siyang Gao, Chao Wang, Shuang Qiu, Linqi Song: Learning to Reason with Insight for Informal Theorem Proving https://arxiv.org/abs/2604.16278 https://arxiv.org/pdf/2604.16278 https://arxiv.org/html/2604.16278
Reham Alharbi, Valentina Tamma, Terry R. Payne, Jacopo de Berardinis: Characterising LLM-Generated Competency Questions: a Cross-Domain Empirical Study using Open and Closed Models https://arxiv.org/abs/2604.16258 https://arxiv.org/pdf/2604.16258 https://arxiv.org/html/2604.16258
Yi Lin, Yihao Ding, Yonghui Wu, Yifan Peng: MARCH: Multi-Agent Radiology Clinical Hierarchy for CT Report Generation https://arxiv.org/abs/2604.16175 https://arxiv.org/pdf/2604.16175 https://arxiv.org/html/2604.16175
Hikaru Shindo, Hanzhao Lin, Lukas Helff, Patrick Schramowski, Kristian Kersting: SocialGrid: A Benchmark for Planning and Social Reasoning in Embodied Multi-Agent Systems https://arxiv.org/abs/2604.16022 https://arxiv.org/pdf/2604.16022 https://arxiv.org/html/2604.16022
Farhad Abtahi, Abdolamir Karbalaie, Eduardo Illueca-Fernandez, Fernando Seoane: MEDLEY-BENCH: Scale Buys Evaluation but Not Control in AI Metacognition https://arxiv.org/abs/2604.16009 https://arxiv.org/pdf/2604.16009 https://arxiv.org/html/2604.16009
Qiang Xu, Shengyuan Bai, Yu Wang, He Cao, Leqing Chen, Yuanyuan Liu, Bin Feng, Zijing Liu, Yu Li: ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams https://arxiv.org/abs/2604.15994 https://arxiv.org/pdf/2604.15994 https://arxiv.org/html/2604.15994
Haoyu Bian, Chaoning Zhang, Jiaquan Zhang, Xingyao Li, Yuanfang Guo, Wei Dong, Yang Yang: Weak-Link Optimization for Multi-Agent Reasoning and Collaboration https://arxiv.org/abs/2604.15972 https://arxiv.org/pdf/2604.15972 https://arxiv.org/html/2604.15972
Hamed Jelodar, Samita Bai, Mohammad Meymani, Parisa Hamedi, Roozbeh Razavi-Far, Ali Ghorbani: Integrating Graphs, Large Language Models, and Agents: Reasoning and Retrieval https://arxiv.org/abs/2604.15951 https://arxiv.org/pdf/2604.15951 https://arxiv.org/html/2604.15951
Olivier L\'etoff\'e, Xuanxiang Huang, Joao Marques-Silva: Towards Rigorous Explainability by Feature Attribution https://arxiv.org/abs/2604.15898 https://arxiv.org/pdf/2604.15898 https://arxiv.org/html/2604.15898
Xing Zhang, Guanghui Wang, Yanwei Cui, Wei Qiu, Ziyuan Li, Bing Zhu, Peiyang He: Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents https://arxiv.org/abs/2604.15877 https://arxiv.org/pdf/2604.15877 https://arxiv.org/html/2604.15877
Liu, Yin, Yuan, Xie, Li, Li, Shen, Xu, Shang, Zhang: Discover and Prove: An Open-source Agentic Framework for Hard Mode Automated Theorem Proving in Lean 4 https://arxiv.org/abs/2604.15839 https://arxiv.org/pdf/2604.15839 https://arxiv.org/html/2604.15839
Thomas Landais, Olivier Goudet, Adrien Go\"effon, Fr\'ed\'eric Saubion, Sylvain Lamprier: Stein Variational Black-Box Combinatorial Optimization https://arxiv.org/abs/2604.15837 https://arxiv.org/pdf/2604.15837 https://arxiv.org/html/2604.15837
Ankit Maloo: KWBench: Measuring Unprompted Problem Recognition in Knowledge Work https://arxiv.org/abs/2604.15760 https://arxiv.org/pdf/2604.15760 https://arxiv.org/html/2604.15760
Sankalp Gilda, Shlok Gilda: Structured Abductive-Deductive-Inductive Reasoning for LLMs via Algebraic Invariants https://arxiv.org/abs/2604.15727 https://arxiv.org/pdf/2604.15727 https://arxiv.org/html/2604.15727
Wenshuo Wang: LLM Reasoning Is Latent, Not the Chain of Thought https://arxiv.org/abs/2604.15726 https://arxiv.org/pdf/2604.15726 https://arxiv.org/html/2604.15726
Wei, Gao, Han, Chen, Zhuang, Guan, Zhang, Cheng, He, Chen, Li, Shi, Duan, Zheng: The World Leaks the Future: Harness Evolution for Future Prediction Agents https://arxiv.org/abs/2604.15719 https://arxiv.org/pdf/2604.15719 https://arxiv.org/html/2604.15719
Chenyi Huang, Haoting Zhang, Jingxu Xu, Zeyu Zheng, Yunduan Lin: Bilevel Optimization of Agent Skills via Monte Carlo Tree Search https://arxiv.org/abs/2604.15709 https://arxiv.org/pdf/2604.15709 https://arxiv.org/html/2604.15709
Jacob Dang, Brian Y. Xie, Omar G. Younis: Subliminal Transfer of Unsafe Behaviors in AI Agent Distillation https://arxiv.org/abs/2604.15559 https://arxiv.org/pdf/2604.15559 https://arxiv.org/html/2604.15559
Saad Alqithami: Preregistered Belief Revision Contracts https://arxiv.org/abs/2604.15558 https://arxiv.org/pdf/2604.15558 https://arxiv.org/html/2604.15558
Yang Li, Zirui Zhang, Yang Liu, Chengzhi Mao: LACE: Lattice Attention for Cross-thread Exploration https://arxiv.org/abs/2604.15529 https://arxiv.org/pdf/2604.15529 https://arxiv.org/html/2604.15529
Dipto Das, Christelle Tessono, Syed Ishtiaque Ahmed, Shion Guha: Bureaucratic Silences: What the Canadian AI Register Reveals, Omits, and Obscures https://arxiv.org/abs/2604.15514 https://arxiv.org/pdf/2604.15514 https://arxiv.org/html/2604.15514
Shivendra Agrawal, Bradley Hayes: GIST: Multimodal Knowledge Extraction and Spatial Grounding via Intelligent Semantic Topology https://arxiv.org/abs/2604.15495 https://arxiv.org/pdf/2604.15495 https://arxiv.org/html/2604.15495
Zhizheng Wang, et al.: DeepER-Med: Advancing Deep Evidence-Based Research in Medicine Through Agentic AI https://arxiv.org/abs/2604.15456 https://arxiv.org/pdf/2604.15456 https://arxiv.org/html/2604.15456
[2026-04-21 Tue (UTC), 25 new articles found for csAI Artificial Intelligence]
Eric Gan, Aryan Bhatt, Buck Shlegeris, Julian Stastny, Vivek Hebbar: ASMR-Bench: Auditing for Sabotage in ML Research https://arxiv.org/abs/2604.16286 https://arxiv.org/pdf/2604.16286 https://arxiv.org/html/2604.16286
Bayer, Lohr, Wei{\ss}, Michelberger, H\"opken: Using Large Language Models and Knowledge Graphs to Improve the Interpretability of Machine Learning Models in Manufacturing https://arxiv.org/abs/2604.16280 https://arxiv.org/pdf/2604.16280 https://arxiv.org/html/2604.16280
Yunhe Li, Hao Shi, Bowen Deng, Wei Wang, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Siyang Gao, Chao Wang, Shuang Qiu, Linqi Song: Learning to Reason with Insight for Informal Theorem Proving https://arxiv.org/abs/2604.16278 https://arxiv.org/pdf/2604.16278 https://arxiv.org/html/2604.16278
Reham Alharbi, Valentina Tamma, Terry R. Payne, Jacopo de Berardinis: Characterising LLM-Generated Competency Questions: a Cross-Domain Empirical Study using Open and Closed Models https://arxiv.org/abs/2604.16258 https://arxiv.org/pdf/2604.16258 https://arxiv.org/html/2604.16258