arXiv cs.AI Artificial Intelligence (@csai-bot) Bsky

Eric Gan, Aryan Bhatt, Buck Shlegeris, Julian Stastny, Vivek Hebbar: ASMR-Bench: Auditing for Sabotage in ML Research https://arxiv.org/abs/2604.16286 https://arxiv.org/pdf/2604.16286 https://arxiv.org/html/2604.16286

11 hours ago 0 0 0 0

Bayer, Lohr, Wei{\ss}, Michelberger, H\"opken: Using Large Language Models and Knowledge Graphs to Improve the Interpretability of Machine Learning Models in Manufacturing https://arxiv.org/abs/2604.16280 https://arxiv.org/pdf/2604.16280 https://arxiv.org/html/2604.16280

11 hours ago 0 0 0 0

Yunhe Li, Hao Shi, Bowen Deng, Wei Wang, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Siyang Gao, Chao Wang, Shuang Qiu, Linqi Song: Learning to Reason with Insight for Informal Theorem Proving https://arxiv.org/abs/2604.16278 https://arxiv.org/pdf/2604.16278 https://arxiv.org/html/2604.16278

11 hours ago 0 0 0 0

Reham Alharbi, Valentina Tamma, Terry R. Payne, Jacopo de Berardinis: Characterising LLM-Generated Competency Questions: a Cross-Domain Empirical Study using Open and Closed Models https://arxiv.org/abs/2604.16258 https://arxiv.org/pdf/2604.16258 https://arxiv.org/html/2604.16258

11 hours ago 0 0 0 0

Yi Lin, Yihao Ding, Yonghui Wu, Yifan Peng: MARCH: Multi-Agent Radiology Clinical Hierarchy for CT Report Generation https://arxiv.org/abs/2604.16175 https://arxiv.org/pdf/2604.16175 https://arxiv.org/html/2604.16175

11 hours ago 0 0 0 0

Hikaru Shindo, Hanzhao Lin, Lukas Helff, Patrick Schramowski, Kristian Kersting: SocialGrid: A Benchmark for Planning and Social Reasoning in Embodied Multi-Agent Systems https://arxiv.org/abs/2604.16022 https://arxiv.org/pdf/2604.16022 https://arxiv.org/html/2604.16022

11 hours ago 0 0 0 0

Farhad Abtahi, Abdolamir Karbalaie, Eduardo Illueca-Fernandez, Fernando Seoane: MEDLEY-BENCH: Scale Buys Evaluation but Not Control in AI Metacognition https://arxiv.org/abs/2604.16009 https://arxiv.org/pdf/2604.16009 https://arxiv.org/html/2604.16009

11 hours ago 0 0 0 0

Qiang Xu, Shengyuan Bai, Yu Wang, He Cao, Leqing Chen, Yuanyuan Liu, Bin Feng, Zijing Liu, Yu Li: ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams https://arxiv.org/abs/2604.15994 https://arxiv.org/pdf/2604.15994 https://arxiv.org/html/2604.15994

11 hours ago 0 0 0 0

Haoyu Bian, Chaoning Zhang, Jiaquan Zhang, Xingyao Li, Yuanfang Guo, Wei Dong, Yang Yang: Weak-Link Optimization for Multi-Agent Reasoning and Collaboration https://arxiv.org/abs/2604.15972 https://arxiv.org/pdf/2604.15972 https://arxiv.org/html/2604.15972

11 hours ago 0 0 0 0

Hamed Jelodar, Samita Bai, Mohammad Meymani, Parisa Hamedi, Roozbeh Razavi-Far, Ali Ghorbani: Integrating Graphs, Large Language Models, and Agents: Reasoning and Retrieval https://arxiv.org/abs/2604.15951 https://arxiv.org/pdf/2604.15951 https://arxiv.org/html/2604.15951

11 hours ago 0 0 0 0

Olivier L\'etoff\'e, Xuanxiang Huang, Joao Marques-Silva: Towards Rigorous Explainability by Feature Attribution https://arxiv.org/abs/2604.15898 https://arxiv.org/pdf/2604.15898 https://arxiv.org/html/2604.15898

11 hours ago 0 0 0 0

Xing Zhang, Guanghui Wang, Yanwei Cui, Wei Qiu, Ziyuan Li, Bing Zhu, Peiyang He: Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents https://arxiv.org/abs/2604.15877 https://arxiv.org/pdf/2604.15877 https://arxiv.org/html/2604.15877

11 hours ago 0 0 0 0

Liu, Yin, Yuan, Xie, Li, Li, Shen, Xu, Shang, Zhang: Discover and Prove: An Open-source Agentic Framework for Hard Mode Automated Theorem Proving in Lean 4 https://arxiv.org/abs/2604.15839 https://arxiv.org/pdf/2604.15839 https://arxiv.org/html/2604.15839

11 hours ago 0 0 0 0

Thomas Landais, Olivier Goudet, Adrien Go\"effon, Fr\'ed\'eric Saubion, Sylvain Lamprier: Stein Variational Black-Box Combinatorial Optimization https://arxiv.org/abs/2604.15837 https://arxiv.org/pdf/2604.15837 https://arxiv.org/html/2604.15837

11 hours ago 0 0 0 0

Ankit Maloo: KWBench: Measuring Unprompted Problem Recognition in Knowledge Work https://arxiv.org/abs/2604.15760 https://arxiv.org/pdf/2604.15760 https://arxiv.org/html/2604.15760

11 hours ago 0 0 0 0

Sankalp Gilda, Shlok Gilda: Structured Abductive-Deductive-Inductive Reasoning for LLMs via Algebraic Invariants https://arxiv.org/abs/2604.15727 https://arxiv.org/pdf/2604.15727 https://arxiv.org/html/2604.15727

11 hours ago 0 0 0 0

Wenshuo Wang: LLM Reasoning Is Latent, Not the Chain of Thought https://arxiv.org/abs/2604.15726 https://arxiv.org/pdf/2604.15726 https://arxiv.org/html/2604.15726

11 hours ago 0 0 0 0

Wei, Gao, Han, Chen, Zhuang, Guan, Zhang, Cheng, He, Chen, Li, Shi, Duan, Zheng: The World Leaks the Future: Harness Evolution for Future Prediction Agents https://arxiv.org/abs/2604.15719 https://arxiv.org/pdf/2604.15719 https://arxiv.org/html/2604.15719

11 hours ago 0 0 0 0

Chenyi Huang, Haoting Zhang, Jingxu Xu, Zeyu Zheng, Yunduan Lin: Bilevel Optimization of Agent Skills via Monte Carlo Tree Search https://arxiv.org/abs/2604.15709 https://arxiv.org/pdf/2604.15709 https://arxiv.org/html/2604.15709

11 hours ago 0 0 0 0

Jacob Dang, Brian Y. Xie, Omar G. Younis: Subliminal Transfer of Unsafe Behaviors in AI Agent Distillation https://arxiv.org/abs/2604.15559 https://arxiv.org/pdf/2604.15559 https://arxiv.org/html/2604.15559

11 hours ago 0 0 0 0

Saad Alqithami: Preregistered Belief Revision Contracts https://arxiv.org/abs/2604.15558 https://arxiv.org/pdf/2604.15558 https://arxiv.org/html/2604.15558

11 hours ago 0 0 0 0

Yang Li, Zirui Zhang, Yang Liu, Chengzhi Mao: LACE: Lattice Attention for Cross-thread Exploration https://arxiv.org/abs/2604.15529 https://arxiv.org/pdf/2604.15529 https://arxiv.org/html/2604.15529

11 hours ago 0 0 0 0

Dipto Das, Christelle Tessono, Syed Ishtiaque Ahmed, Shion Guha: Bureaucratic Silences: What the Canadian AI Register Reveals, Omits, and Obscures https://arxiv.org/abs/2604.15514 https://arxiv.org/pdf/2604.15514 https://arxiv.org/html/2604.15514

11 hours ago 0 0 0 0

Shivendra Agrawal, Bradley Hayes: GIST: Multimodal Knowledge Extraction and Spatial Grounding via Intelligent Semantic Topology https://arxiv.org/abs/2604.15495 https://arxiv.org/pdf/2604.15495 https://arxiv.org/html/2604.15495

11 hours ago 0 0 0 0

Zhizheng Wang, et al.: DeepER-Med: Advancing Deep Evidence-Based Research in Medicine Through Agentic AI https://arxiv.org/abs/2604.15456 https://arxiv.org/pdf/2604.15456 https://arxiv.org/html/2604.15456

11 hours ago 0 0 1 0

[2026-04-21 Tue (UTC), 25 new articles found for csAI Artificial Intelligence]

11 hours ago 0 0 0 0

Eric Gan, Aryan Bhatt, Buck Shlegeris, Julian Stastny, Vivek Hebbar: ASMR-Bench: Auditing for Sabotage in ML Research https://arxiv.org/abs/2604.16286 https://arxiv.org/pdf/2604.16286 https://arxiv.org/html/2604.16286

1 day ago 0 0 0 0

Bayer, Lohr, Wei{\ss}, Michelberger, H\"opken: Using Large Language Models and Knowledge Graphs to Improve the Interpretability of Machine Learning Models in Manufacturing https://arxiv.org/abs/2604.16280 https://arxiv.org/pdf/2604.16280 https://arxiv.org/html/2604.16280

1 day ago 0 0 0 0

Yunhe Li, Hao Shi, Bowen Deng, Wei Wang, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Siyang Gao, Chao Wang, Shuang Qiu, Linqi Song: Learning to Reason with Insight for Informal Theorem Proving https://arxiv.org/abs/2604.16278 https://arxiv.org/pdf/2604.16278 https://arxiv.org/html/2604.16278

1 day ago 0 0 0 0

Reham Alharbi, Valentina Tamma, Terry R. Payne, Jacopo de Berardinis: Characterising LLM-Generated Competency Questions: a Cross-Domain Empirical Study using Open and Closed Models https://arxiv.org/abs/2604.16258 https://arxiv.org/pdf/2604.16258 https://arxiv.org/html/2604.16258

1 day ago 0 0 0 0

Posts by arXiv cs.AI Artificial Intelligence