1. Benchmarking Open-Ended Multi-Agent Coordination in Language Agents
Kale-ab Abebe Tessera, Andras Szecsenyi, Cameron Barker, Alexander Rutherford, Davide Paglieri, Aidan Scannell, Henry Gouk, Elliot J. Crowley, Tim Rocktäschel, Amos Storkey
3. VATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation
4. The Cold-Start Safety Gap in LLM Agents
Chung-En Sun, Linbo Liu, Tsui-Wei Weng
5. Overcoming the Regulatory Bottleneck via Agent-to-Agent Protocols: A Nuclear Case Study
Akshay J. Dave, David Grabaskas, Joseph A. Renevitz, Richard B. Vilim
6. A Multi-modal Agentic Co-pilot for Evidence Grounded Computational Pathology
Zhe Xu, Zhengyu Zhang, Zhiyuan Cai, Jiahao Xu, Yijie Lin, ..., Yihui Wang, Yingxue Xu, Ronald Cheong Kin Chan, Li Liang, Hao Chen
7. REFLECT: Intervention-Supported Error Attribution for Silent Failures in LLM Agent Traces
Xiaofeng Lin, Yingxu Wang, Tung Sum Thomas Kwok, Daniel Guo, Sahil Arun Nale, Charles Fleming, Guang Cheng
8. ViMax: Agentic Video Generation
Lingxuan Huang, Sizhe He, Hengji Zhou, Liqiang Nie, Lianghao Xia, Chao Huang
9. SAGE: An LLM-driven Self Reflective Agentic Framework for Fraud Detection
Yichen Chen, Siying Li, Yuhang Liang, Lijun Wang, Renyang Liu
10. PathoSage: Towards Multi-Source Evidence Adjudication in Pathology via Experience-Aware Agentic Workflow
Chengyang Zhang, Wenchuan Zhang, Bo Li, Mengran Li, Bob Zhang, Yuhao Yi, Hong Bu, Jiancheng Lv
Show 128 more
11. AlloSpatial: Agentic Harness Framework for Spatial Reasoning in Foundation Models
Shouwei Ruan, Bin Wang, Zhenyu Wu, Qihui Zhu, Yuxiang Zhang, Jingzhi Li, Yubin Wang, Xingxing Wei
12. Baichuan-M4: A Clinical-Grade Medical Agent System for Continuous Care
Aiyuan Yang, Chengfeng Dou, Da Pan, Dian Wang, Fan Yang, ..., Yichuan Mo, Canbin Piao, Leyi Pan, Yihe Luo, Zian Wang
13. IS-CoT: Breaking the Long-form Generation Collapse via Interleaved Structural Thinking
Zechen Sun, Yuyang Sun, Zecheng Tang, Juntao Li, Wenpeng Hu, Wenliang Chen, Zhunchen Luo, Guotong Geng, Min Zhang
14. Collective Hallucination in Multi-Agent LLMs:Modeling and Defense
15. MAVIS: Multi-Agent Video Retrieval via Structured Video Understanding
Jie Zhang, Qilang Ye, Hao Zhou, Haochen Liang, Fei Luo
16. Agentic Neuro-Symbolic Planning and Commissioning for Human-in-the-Loop Industrial Robotics with Digital Twins
Zhihao Liu, Victor Nan Fernandez-Ayala, Tianyu Wang, Qiang Qin, Xi Vincent Wang, Dimos V. Dimarogonas, Lihui Wang
17. MemToolAgent overview with a simple restaurant booking scenario where the agent retrieves similar memories, receives feedback on an invalid time format, and generates a reflection to update its memory
Suleyman Armagan Er, Danilo Ribeiro, Yogesh Virkar, Surafel Lakew, Adi Kalyanpur, James Gung, Thomas Delteil, Arshit Gupta
18. RAILS: Verification-Native Clearing For Agentic Commerce
Adrian de Valois-Franklin, Alex Bogdan
19. Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses
Xiaojun Wu, Cehao Yang, Honghao Liu, Xueyuan Lin, Wenjie Zhang, Zhichao Shi, Xuhui Jiang, Chengjin Xu, Jia Li, Jian Guo
20. Brain-Prompt Injection: A Route-Safety Audit for BCI-LLM Agents
21. VideoWeaver: Evaluating and Evolving Skills for Agentic Long Video Generation
Jianhui Wei, Jie Tan, Hengchuan Zhu, Xiaotian Zhang, Yan Zhang, Ziyi Chen, Daoan Zhang, Wei Xu, Zuozhu Liu
22. A Multi-Agent System for IPMSM Design Optimization via an FEA-AI Hybrid Approach
Jinseong Han, Sunwoong Yang, Namwoo Kang
23. Distilling LLM Reasoning into an Interpretable Policy Tree for Human-AI Collaboration
Beiwen Zhang, Yongheng Liang, Guowei Zou, Haitao Wang, Hejun Wu
24. Scaffold Effects on GAIA: A Controlled Comparison
25. DICE: Entropy-Regularized Equilibrium Selection for Stable Multi-Agent LLM Coordination
Yi Xie, Zhanke Zhou, Chentao Cao, Bo Liu, Bo Han
26. MAGIS: Evidence-Based Multi-Agent Reasoning for Interpretable Strabismus Clinical Decision-Making
Xikai Tang, Yifan Wang, Jiafan Zhuang, Li Luo, Jinming Guo, ..., Jie Cen, Guangqiang Yin, Kunliang Qiu, Ce Zheng, Zhun Fan
27. Bidirectional Semantic Complementary Tool Retrieval for Remote Sensing Agents
Zeyuan Wang, Dongyang Hou, Cheng Yang, Xuezhi Cui, Linrui Xu, ..., Liangtian Liu, Kai Ouyang, Wang Guo, Lili Zhu, Chao Tao
28. PBSD: Privileged Bayesian Self-Distillation for Long-Horizon Credit Assignment
Yang Tian, Rui Wang, Xumeng Wen, Junjie Li, Shizhao Sun, Lei Song, Jiang Bian, Bo Zhao
29. SSR: Can Simulated Patients Learn to Stigmatize Themselves? Modeling Self-Stigma through Internal Monologue
Kunyao Lan, Bingrui Jin, Zichen Zhu, Mengyue Wu
30. Struct-Searcher: Agentic Structural Thinking Advances Multimodal Deep Information Seeking
Fan Zhang, Vireo Zhang, Shengju Qian, Haoxuan Li, Zheng Lian, Hao Wu, Yuan Gao, Xinyu Geng, Xin Wang, Pheng-Ann Heng
31. LUNA-AD: Lightweight Uncertainty-Aware Language Model with Lifelong Learning for Autonomous Driving
Ruoyu Yao, Pei Liu, Ruiguo Zhong, Mingxing Peng, Rui Yang, Jun Ma
32. Syll: Open-Source Personal Automation with Cross-Surface Execution
Bo Zhang, Borui Zhang, Chenghao Jiang, Minglei Shi, Xiaofeng Wang, Zheng Zhu, Jie Zhou, Jiwen Lu
33. PRISM: Recovering Instruction Sets from Language Model Activations
Gilad Gressel, Rahul Pankajakshan, Julia Diament, Efim Hudis, Krishnashree Achuthan, Yisroel Mirsky
34. (Auto)formalization is supposed to be easy: Trellis process semantics for spelling out rigorous proofs
35. SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research
Pu Ning, Quan Chen, Kun Tao, Xinyu Tang, Tianshu Wang, Qianggang Cao, Xinyu Kong, Zujie Wen, Zhiqiang Zhang, Jun Zhou
36. Causal Agent Replay: Counterfactual Attribution for LLM-Agent Failures
37. Representational Similarity and Model Behavior in Multi-Agent Interaction
Yujin Potter, Seun Eisape, Shiyang Lai, Alexander Huth, James Evans, Been Kim, Jacob Eisenstein, Dawn Song, Alane Suhr
38. From Player to Master: Enhancing Test-Time Learning of LLM Agents via Reinforcement Learning over Memory
Yishuo Cai, Xingyu Guo, Xuancheng Huang, Jinhua Du, Can Huang, ..., Wenhan Ma, Yuyang Hu, Aohan Zeng, Jie Tang, Xu Sun
39. HDSL: A Hierarchical Domain-Specific Language for Structured 3D Indoor Scene Generation and Localized Editing with LLM Agents
Letian Li, Chao Shen, Shuzhao Xie, Chenghao Gu, ZhengXiao He, Yu Meng, Xin Yang, Wenyuan Jiang, Zhi Wang
40. Continual Quadruped Robots Coordination via Semantic Skill Discovery
Daoqing Wang, Yuchen Xiao, Weixuan Huang, Zhilong Zhang, Shenghua Wan, Meng Li, Lei Yuan, Yang Yu
41. To Nuke or Not to Nuke: LLMs' (Missing) Ethical Reasoning and Actions in a High-Stakes Decision-Making Simulation
John Chen, Sihan Cheng, Can Gurkan, H M Abdul Fattah
42. ConMem: Structured Memory-Guided Adaptation in Training-Free Multi-Agent Systems
Zhixun Tan, Qiang Chen, Tairan Huang, Xiu Su, Yi Chen
43. RunAgent SuperBrowser: A Theory of Autonomous Web Navigation Grounded in Human Browsing Behaviour
44. Rosetta Memory: Adaptive Memory for Cross-LLM Agents
Hao Yang, Shiqi Shen, Haoxuan Li, Zhipeng Wang, Zhi Gong, Xu Chen
45. Byzantine Cheap Talk: Adversarial Resilience and Topology Effects in LLM Coordination Games
Aya El Mir, Martin Takáč, Salem Lahlou
46. QueryWeaver: Reliable Multi-Tool Query Execution Planning via LLM-Based Graph Generation
Aishwarya Chakravarthy, Vidhi Kulkarni, Duen Horng Chau
47. SLMJury: Can Small Language Models Judge as Well as Large Ones?
Anish Laddha, Nitesh Pradhan, Gaurav Srivastava
49. Bridging the Agent-World Gap: Text World Models for LLM-based Agents
Yixia Li, Hongru Wang, Peng Lai, Zhiwen Ruan, He Zhu, ..., Jeff Z. Pan, Jia Pan, Guanhua Chen, Yang Liu, Guanbin Li
50. H2HMem: A Multimodal Memory Benchmark for Agents in Human-Human Interactions
Shiping Zhu, Yibo Yang, Zhengyang Wang, Tiancheng Shen, Dandan Guo, Ming-Hsuan Yang
51. Memory Beyond Recall: A Dual-Process Cognitive Memory System for Self-Evolving LLM Agents
Tianxiang Fei, Mingyang Song, Mao Zheng, Xiang Yu
52. Hallucination Cascade: Analyzing Error Propagation in Multi-Agent LLM Systems
Saeid Jamshidi, Arghavan Moradi Dakhel, Kawser Wazed Nafi, Foutse Khomh
53. SGTO-MAS: Secure Gorilla Troops Optimization for Multi-Agent LLM Systems
54. Observability for Delegated Execution in Agentic AI Systems
55. Visual Para-Thinker++: A Single-Policy Multi-Agent Framework for Visual Reasoning
Haoran Xu, Hongyu Wang, Yifei Gao, Jiaze Li, Zizhao Tong, Xiaofeng Zhang, Xiaosong Yuan
56. PhysAgent: Automating Physics-Based 4D Synthesis via Trajectory-Grounded Multi-Agent Feedback
Chunji Lv, Jiaxi Ye, Yuchen Jiang, Rexar Lin, Changsheng Li
57. A case study of evaluating AI agents on a neuroscience data-to-discovery pipeline
Kai A. Horstmann, Ethan Lin, Alice A. Robie, Jennifer J. Sun, Kristin Branson
58. Beyond Goodhart's Law: A Dynamic Benchmark for Evaluating Compliance in Multi-Agent Systems
Yiyang Zhao, Zhuo Zhang, Qingxuan Le, Lizhen Qu, Zenglin Xu
59. PaperMentor: A Human-Centered Multi-Agent Writing Tutor for AI Research Papers on Overleaf
Jiarui Liu, Terry Jingchen Zhang, Ryan Faulkner, X. Angelo Huang, Vilém Zouhar, ..., Zeju Qiu, Sankalan Pal Chowdhury, Bernhard Schölkopf, Mona Diab, Zhijing Jin
60. From Statute to Control Flow: Span-Grounded Deontic Trees for Defeasible Scope Parsing
Jian Chen, Siyuan Li, Chucheng Wan, Zixuan Yuan
61. POISE: Position-Aware Undetectable Skill Injection on LLM Agents
Haochang Hao, Dehai Min, Zhifang Zhang, Yunbei Zhang, Miao Xu, Yingqiang Ge, Lu Cheng
62. IEA: Amateur-Friendly Conversational Image Editing Agent via Three Stages of Multitask Alignment
Zichen Zhu, Yuheng Sun, Mingxuan Zhu, Wenjie Ma, Situo Zhang, ..., Zihan Zhao, Dingye Liu, Siqi Xiang, Lu Chen, Kai Yu
63. SceneConductor: 3D Scene Generation from Single Image with Multi-Agent Orchestration
Jeonghwan Kim, Yushi Lan, Yongwei Chen, Hieu Trung Nguyen, Chuanyu Pan, Xingang Pan
64. TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding
Lianyu Hu, Xiaoyu Ma, Zeqin Liao, Yang Liu
65. A multi-agent system for spine MRI report generation from multi-sequence imaging
Zhiping Xiao, Junwei Yang, Gongbo Sun, Han Zhang, Hanwen Xu, ..., Sammy Chu, Ming Zhang, Paul E. Kinahan, Nathan M. Cross, Sheng Wang
66. OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics
Mingxian Lin, Shengju Qian, Yuqi Liu, Yi-Hua Huang, Yiyu Wang, ..., Fan Zhang, Zeyu Hu, Lingting Zhu, Xin Wang, Xiaojuan Qi
67. SpaceVLN: A Zero-Shot Vision-and-Language Navigation Agent with Online Spatial Cognitive Memory and Reasoning
Yucheng Deng, Pingrui Lai, Xinhai Li, Chenjia Bai, Xiaoheng Deng, Chengnuo Sun, Xuelong Li, Hua Yang
68. The ACUTE Protocol: Operationalizing Language Model Activations for Better Calibration, Utility, and Trust
Nishant Subramani, Palash Goyal, Yiwen Song, Mani Malek, Yuan Xue, Tomas Pfister, Hamid Palangi
69. Where Instruction Hierarchy Breaks: Diagnosing and Repairing Failures in Reasoning Language Models
Sanjay Kariyappa, G. Edward Suh
70. Contract2Tool: Learning Preconditions and Effects for Reliable Tool-Augmented LLM Agents
Rahul Suresh Babu, Laxmipriya Ganesh Iyer
71. Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using LLM Agents
Xinyu Guan, Qianyang Zhao, Yuming Deng
73. VESTA: A Fully Automated Scenario Generation and Safety Evaluation Framework for LLM Agents
Lu Jia, Haibo Tong, Feifei Zhao, Jindong Li, Dongqi Liang, Ping Wu, Qian Zhang, Yi Zeng
74. Quantitative Promise Theory: Intentionality and Inference in Autonomous Agents
75. Oversight Has a Capacity: Calibrating Agent Guards to a Subjective, Fatiguing Human
76. The Token Not Taken: Sampling, State, and the Variability of AI Agent Outputs
Muhammad Zia Hydari, Raja Iqbal
77. Agent Economics: An Entropy-Controlled Pluralistic Alignment Framework for Preventing Artificial Hivemind in Autonomous Agents
78. Capability-Aligned Hierarchical Learning for Tool-Augmented LLMs
Haotong Yang, Ting Long, Yi Chang
79. AliyunConsoleAgent: Training Web Agents in Real-World Cloud Environments via Distillation and Reinforcement Learning
Bojie Rong, Zheyu Shen, Qiaoping Wang, Pengfei Kang, Yang Xu, Yawen Wei, Hanyu Wu, Zhi Zhao, Leihao Pei, Linquan Jiang
80. Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text
Yutong Bian, Dongjie Cheng, Heming Xia, Yongqi Li, Wenjie Li
81. Collaborative Human-Agent Protocol (CHAP)
Arsalan Shahid, Gordon Suttie, Philip Black
82. SPIN: Decentralized Swarm Control via Tensorized Policy Coordination
83. ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research
Wanghan Xu, Shuo Li, Tianlin Ye, Qinglong Cao, Yixin Chen, ..., Guangtao Zhai, Wanli Ouyang, Bo Zhang, Lei Bai, Wenlong Zhang
84. WhiFlash: Accelerating Speculative Decoding with Token-Level Cross-Paradigm Routing
Young D. Kwon, Miles Williams, Rui Li, Alexandros Kouris, Stylianos I. Venieris
85. Semantic Quorum Assurance: Collective Certification for Non-Deterministic AI Infrastructure
86. Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement Learning
Daoyu Wang, Mingyue Cheng, Qingchuan Li, Shuo Yu, Jie Ouyang, Qi Liu
87. Principled Agent Debate: Adversarial Arbitration for Sycophancy Reduction in Large Language Models
88. Aligned but Not Partner-Specific: Distinguishing How Multimodal LLM Agents Succeed in Reference Games Without Human-Like Conventions
Po-Ya Angela Wang, Chinmaya Mishra, Aslı Özyürek, Paula Rubio-Fernández, Esam Ghaleb
89. From Holistic Evaluation to Structured Criteria: Rubrics Across the Evolving LLM Landscape
Hao Chen, Ziyu Han, Yukun Yan, Qingfu Zhu, Maosong Sun, Wanxiang Che
90. PerspectiveGap: A Benchmark for Multi-Agent Orchestration Prompting
Youran Sun, Xingyu Ren, Kejia Zhang, Xinpeng Liu, Jiaxuan Guo
91. What Should a Skill Remember? Quality-Cost Trade-offs in Cost-Aware Skill Rewriting for Language Model Agents
Qinghua Xing, Yinda Chen, Yaping Jin, Zhenhe Wu, Bohan Lin, Hang Zhou, Xinghao Chen, Hanting Chen, Zhiwei Xiong
92. AGENTSERVESIM: A Hardware-aware Simulator for Multi-Turn LLM Agent Serving
Rakibul Hasan Rajib, Mengxin Zheng, Qian Lou
93. Civil Court Simulation with Large Language Models
Yifan Chen, Haitao Li, Kaiyuan Zhang, Yueyue Wu, Qingyao Ai, Yiqun Liu
94. Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops
Ziqian Zhong, Ivgeni Segal, Ivan Bercovich, Shashwat Saxena, Kexun Zhang, Aditi Raghunathan
95. Context-Fractured Decomposition Attacks on Tool-Using LLM Agents: Exploiting Artifact Provenance Gaps
Xiaofeng Lin, Yukai Yang, Daniel Guo, Sahil Arun Nale, Charles Fleming, Guang Cheng
96. SecureClaw: Clawing Back Control of LLM Agents
97. VisualLeakBench: Reproducible Action-Boundary Propagation Failures in Vision-Language Agents
Youting Wang, Yuan Tang, Yitian Qian, Chen Zhao
98. Crayotter: Traceable Multi-Agent Workflows for Long-Form Video Editing
Lecheng Yan, Yichong Zhang, Ben Pan, Xiaoyu Zheng, Jiawei Qian, Anqi Wu, Wenxi Li, Chenyang Lyu
99. FineGen: A VLM-based Multi-Agent Framework for Fine-Grained Image-Text Dataset Construction
Chang Kong, Yuebing Li, Peng Mo, Haigang Zhang, Qiuming Luo
100. DyCo-RL: Dynamic Cross-Modal Coordination for Visual Reasoning
Hangui Lin, Yan Shu, Zhengyang Liang, Chi Liu, Xiangrui Liu, Minghao Qin, Teng Long, Zheng Liu, Nicu Sebe
101. CRANE: Knowledge Editing for Reasoning MLLMs
Han Huang, Hao Wang, Mengqi Zhang, Shu Wu, Qiang Liu, Liang Wang
102. HDRAgent: An Agentic Framework for Multi-Exposure HDR Imaging
Weiyu Zhou, Tao Hu, Yijian Wang, Xiaogang Xu, Ruixing Wang, Qingsen Yan
103. Claude Code-Driving Scenario Mining for the Argoverse 2 Challenge
Wei Deng, Caoshengzhe Xue, Shuaikun Liu, Zhaohong Liu, Mengshi Qi, Huadong Ma
104. Prisma-World: Camera-Controllable Multi-Agent Video World Model
Huiqiang Sun, Zhan Peng, Size Wu, Kun Wang, Kang Liao, ..., Sheng Jin, Yangguang Li, Zhiguo Cao, Ziwei Liu, Wei Li
105. Evaluating Advanced Prompting on Gemini Flash for Multi-Hop Biomedical QA
Ahmed Bajaber, Mohammed Alliheedi
106. MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning
107. Cooperative Long Rope Skipping via Multi-Agent Reinforcement Learning
Zihao Wang, Shijie Peng, Kerui Wu, Yu Huang, Ruiqi Xue, Dong Liu, Tian Xu, Lei Yuan, Yang Yu
108. HARBOR: A Harness Framework for Agentic Robot Reinforcement Learning
Zechu Li, Yufeng Jin, Xiaoyang Liu, Puze Liu, Vignesh Prasad, Carlo DÉramo, Georgia Chalvatzaki
109. Self-Paced Curriculum Reinforcement Learning for Autonomous Superbike Racing in Simulation
Luca Ghisi, Jacopo Essenziale, Carlo DÉramo, Matteo Luperto
110. Shape Formation for the Cooperative Transportation of Arbitrary Objects Using Multi-Agent Reinforcement Learning
Mohamed Sayed, Wolfram Burgard, Tanja Katharina Kaiser
111. Code Is More Than Text: Uncertainty Estimation for Code Generation
Yuling Shi, Caiqi Zhang, Yuexian Li, Haopeng Wang, Yeheng Chen, Nigel Collier, Xiaodong Gu
112. Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning
Mujtaba Farhan, Maheep Chaudhary
113. ComplexConstraints and Beyond: Expert Rubrics for RLVR
Sushant Mehta, Liudas Panavas, Edwin Chen
114. SAW: Stage-Aware Dynamic Weighting for Multi-Objective Reinforcement Learning in Large Language Models
Yuchen He, Baolong Bi, Shenghua Liu, Huaming Liao, Yuyao Ge, Bolin Wan, Siqian Tong, Juan Chen, Jiafeng Guo, Xueqi Cheng
115. Agentic Search for Counterfactual Recourse under Fixed LLM Budgets
116. Data Agents Under Attack: Vulnerabilities in LLM-Driven Analytical Systems
Kuncan Wang, Ziting Wang, Peizhuo Lv, Haoyang Li, Guoliang Li, Gao Cong, Wei Dong
117. Trustworthy Smart Fabs via Professional Proxies: Scaling Safe and Sustainable by Design (SSbD) through Industrial Data Spaces
Han-Teng Liao, Chang-Yi Kao, Karen Ang
118. Multi-Turn Evaluation of Deep Research Agents Under Process-Level Feedback
Rishabh Sabharwal, Hongru Wang, Amos Storkey, Jeff Z. Pan
119. How Small Can You Go? LoRA Fine-Tuning 270M-8B Models for Merchant Information Extraction in Financial Transactions
Donghao Huang, Tomas Drietomsky, Benjamin Barrett, Zhaoxia Wang
120. When No Answer Is Correct: Diagnosing Absent Answer Detection for MLLMs in Video Understanding
Yiheng Wang, Yueqian Lin, Lichen Zhu, Yudong Liu, Hai "Helen" Li, Yiran Chen
121. A Resilience-as-a-Service assessment framework for coordinated disruption response in interdependent urban transit systems
Sara Jaber, S. M. Hassan Mahdavi, Neila Bhouri, Mostafa Ameli
122. Can the Environment Speak for Itself? $T^{2}$-GRPO: A Turn-Trajectory Group Relative Policy Optimization for Caregiver Agents
Yutong Song, Jiang Wu, Pengfei Zhang, Wenjun Huang, Honghui Xu, Nikil Dutt, Amir M. Rahmani
123. Capacity, Not Format: Rethinking Structured Reasoning Failures
124. ZAS-SQL: Distilling Rules from Failures for Zero-Shot Text-to-SQL
Hongzhou Zheng, Yixin Gou, Wenjia Zhang
125. More Yap Less Meaning: Uncovering Self-Improvement Behavior in SLMs
Marina Igitkhanian, Erik Arakelyan
126. SafeRun: Enabling Determinism in LLM Planning for Running
Meilin Chen, Zepeng Zhai, Jiaxuan Zhao, Yuan Lu
127. Sample-Efficient LLM-Based Detection of Malicious Web Server Logs with Forensically Explainable Reasoning
Bernhard Kneip, Nhien-An Le-Khac, Hong-Hanh Nguyen-Le
128. IMUG-Bench: Benchmarking Unified Multimodal Models on Interleaved Understanding and Generation
Lingyi Meng, Zecong Tang, Haoran Li, Tengju Ru, Zhejun Cui, ..., Kaixuan Wang, Yu-Jie Yuan, Chunwei Wang, Yu Zhang, Bo Dai
129. Deterministic Integrity Gates for LLM-Assisted Clinical Manuscript Preparation: An Auditable Biomedical Informatics Architecture
Yoojin Nam, Jinhoon Jeong, Namkug Kim
131. BEACON: Behavioral Entropy Aggregation for Cross-Model Hallucination Detection in Large Language Models
Naveen Bera, Pulijala Sai Nikhila, Kondaguduru Abhiram, Shaik Gayaz Ali, Shoaib Sadiq Salehmohamed, Shaik Mohammed Omar, Jinal Prashant Thakkar, Hansika Aredla, Shalmali Ayachit
132. When Languages Disagree: Self-Evolving Multilingual LLM Judges
133. SAEExplainer: Interpreting SAE Features with Activation-Guided Preference Optimization
Jingyi He, Haiyan Zhao, Ruxue Shi, Yanguang Liu, Xin Wang, Fei Sun, Mengnan Du
134. Proxy Reward Internalization and Mechanistic Exploitation: A Learned Precursor to Reward Hacking and Its Generalization
Mohammad Beigi, Ming Jin, Lifu Huang
135. Tight Sample Complexity of Transformers
Chenxiao Yang, Nathan Srebro, Zhiyuan Li
136. Belief-Space Quantum-Inspired Reinforcement Learning for Partially Observable Autonomous Cyber Defense in the Internet of Vehicles
Anwar Shah, Rohan Farooq, Sajid Anwer, Tallha Akram, Usman Ghous, Sajid Ullah Khan
137. Block-A-Mole: The Sustainability Frontier of Moving-Target Censorship Resistance
138. REACT 2026: The Fourth Multiple Appropriate Facial Reaction Generation Challenge: Personalised MAFRG and Appropriate EEG Reaction Prediction
Siyang Song, Micol Spitale, Zijian Wu, Xiangyu Kong, Cheng Luo, ..., Mohamed Daoudi, Fabien Ringeval, Andrew Howes, Elisabeth Andre, Hatice Gunes