Agentic AI digest - 2026-05-29

A ranked brief from the day's arXiv listing. Cortiq weighs topical fit, lead-author context, and public research signals before the issue is published.

Agentic AI

1. Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation

2. The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

3. VitalAgent: A Tool-Augmented Agent for Reactive and Proactive Physiological Monitoring over Wearable Health Data

4. Training Deliberative Monitors for Black-Box Scheming Detection

5. How Consistent Are LLM Agents? Measuring Behavioral Reproducibility in Multi-Step Tool-Calling Pipelines

6. MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs

7. RoboWits: Unexpected Challenges for Robotic Creative Problem Solving

8. Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

9. GTA: Generating Long-Horizon Tasks for Web Agents at Scale

10. PTCG-Bench: Can LLM Agents Master Pokémon Trading Card Game?

11. Formalizing Mathematics at Scale

12. Molecular Lead Optimization via Agentic Tool Planning

13. SafeRx-Agent: A Knowledge-Grounded Multi-Agent Framework for Safe and Explainable Medication Recommendation

14. PatchBoard: Schema-Grounded State Mutation for Reliable and Auditable LLM Multi-Agent Collaboration

15. The Best-Laid SCHEMEs: Coordinated Sabotage and Monitoring in Multi-Agent Systems

16. HunterAgent: Neuro-Symbolic Attack Trace Reconstruction under Anti-Forensics

17. VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis

18. HEART-Bench: Do LLM Agents Exhibit Human-like Psychology?

19. Beyond Consensus: Trace-Level Synthesis in Mixture of Agents

20. Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling

21. Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

22. BitTP: The Lightweight Trajectory Prediction Model with BitLLM for Edge-Devices

23. Why Specialist Models Still Matter: A Heterogeneous Multi-Agent Paradigm for Medical Artificial Intelligence

24. SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

25. Redundant or Necessary? A Benchmark for Detecting Redundant Steps in Agent Trajectories

26. Compass: Navigating Global Marine Lead Data Integration through Expert-Guided LLM Agent

27. KairosAgent: Agentic Time Series Forecasting with Fused Semantic Reasoning

28. Enhancing Multi-Agent Communication through Attention Steering with Context Relevance

29. AgentSchool: An LLM-Powered Multi-Agent Simulation for Education

30. Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents

31. No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand

32. SURGENT: A Surgical Multi-Agent Assistance System Across the Perioperative Workflow

33. Does The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web Agents

34. SEAL: Can Saturated Benchmarks Be Revived by LLM-as-a-Meta-Judge?

35. WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction

36. AgentCVR: Active Multi-Agent Cross-Video Reasoning via Script-Simulated Reinforcement Learning

37. GenClaw: Code-Driven Agentic Image Generation

38. Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes

39. Differentiable Belief-based Opponent Shaping

40. The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

41. PRO-CUA: Process-Reward Optimization for Computer Use Agents

42. Governing Technical Debt in Agentic AI Systems

43. BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents

44. DenseSteer: Steering Small Language Models towards Dense Math Reasoning

45. Indexing the Unreadable: LLM-Native Recursive Construction and Search of Service Taxonomies

46. CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval

47. Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

48. The Curse of Helpfulness: Inverse Scaling Law in Robustness to Distractor Instructions via DistractionIF

49. Improving Collaborative Storytelling with a Multi-Agent Framework Based on Large Language Models

50. TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation

51. GRASP: Gated Regression-Aware Skill Proposer for Self-Improving LLM Agents

52. Croissant Tasks: A Metadata Format for Reproducible Machine Learning Evaluations

53. SkillsInjector: Dynamic Skill Context Construction for LLM Agents

54. AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

55. Learning to Choose: An Empowerment-Guided Multi-Agent System with semantic communication for Adaptive Method Selection

56. Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents

57. Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents

58. On Distributional Reinforcement Learning in Chaotic Dynamical Systems

59. Gram: Assessing sabotage propensities via automated alignment auditing

60. GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling

61. GrowLoop: Self-Evolving Conversation Evaluation Seeded by Human

62. Reasoning that Travels: Dissecting How Chain-of-Thought Transfers Across Models

63. Bosses, Kings, and the Commons: Cooperation Under Power Asymmetry in LLM Societies

64. Analyzing Persona Effects in Generated Explanations from Multimodal LLM Agents in Urban Perception

65. Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents

66. Revisiting Observation Reduction for Web Agents: Comprehensive Evaluation with a Lightweight Framework

67. Learning Design Skills as Memory Policies for Agentic Photonic Inverse Design

68. SkillBrew: Multi-Objective Curation of Skill Banks for LLM Agents

69. DirectorBench: Diagnosing Long-Form Video Generation with Personalized Multi-Agent Evaluation

70. Hijacking Agent Memory: Stealthy Trojan Attacks Through Conversational Interaction

71. Honeyval: A Comprehensive Evaluation Framework for LLM-powered HTTP Honeypots

72. V2XCrafter: Learning to Generate Driving Scene Across Agents

73. FRUC: Feedforward Dynamic Scene Reconstruction from Uncalibrated Collaborative Driving Views

74. FakeVLM-R1: Internalizing Physical Laws via CoT for Synthetic Image Detection

75. AnomalyAgent: Training-Free Agentic Models for Zero-/Few-Shot Anomaly Detection

76. DeepSurvey: Enhancing Analytical Depth and Citation Reliability in Automated Survey Generation

77. ParaTool: Shifting Tool Representations from Context to Parameters

78. DeepTool: Scaling Interleaved Deliberation in Tool-Integrated Reasoning via Process-Supervised Reinforcement Learning

79. Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems