Agents
GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents
Yu, Xi, Yang, Yang, Liu, Qun, Du, Yonghua, McSweeney, Sean, Lin, Yuewei
Cellular image segmentation is essential for quantitative biology yet remains difficult due to heterogeneous modalities, morphological variability, and limited annotations. We present GenCellAgent, a training-free multi-agent framework that orchestrates specialist segmenters and generalist vision-language models via a planner-executor-evaluator loop (choose tool $\rightarrow$ run $\rightarrow$ quality-check) with long-term memory. The system (i) automatically routes images to the best tool, (ii) adapts on the fly using a few reference images when imaging conditions differ from what a tool expects, (iii) supports text-guided segmentation of organelles not covered by existing models, and (iv) commits expert edits to memory, enabling self-evolution and personalized workflows. Across four cell-segmentation benchmarks, this routing yields a 15.7\% mean accuracy gain over state-of-the-art baselines. On endoplasmic reticulum and mitochondria from new datasets, GenCellAgent improves average IoU by 37.6\% over specialist models. It also segments novel objects such as the Golgi apparatus via iterative text-guided refinement, with light human correction further boosting performance. Together, these capabilities provide a practical path to robust, adaptable cellular image segmentation without retraining, while reducing annotation burden and matching user preferences.
Too Open for Opinion? Embracing Open-Endedness in Large Language Models for Social Simulation
Ma, Bolei, Cao, Yong, Sen, Indira, Haensch, Anna-Carolina, Kreuter, Frauke, Plank, Barbara, Hershcovich, Daniel
Large Language Models (LLMs) are increasingly used to simulate public opinion and other social phenomena. Most current studies constrain these simulations to multiple-choice or short-answer formats for ease of scoring and comparison, but such closed designs overlook the inherently generative nature of LLMs. In this position paper, we argue that open-endedness, using free-form text that captures topics, viewpoints, and reasoning processes "in" LLMs, is essential for realistic social simulation. Drawing on decades of survey-methodology research and recent advances in NLP, we argue why this open-endedness is valuable in LLM social simulations, showing how it can improve measurement and design, support exploration of unanticipated views, and reduce researcher-imposed directive bias. It also captures expressiveness and individuality, aids in pretesting, and ultimately enhances methodological utility. We call for novel practices and evaluation frameworks that leverage rather than constrain the open-ended generative diversity of LLMs, creating synergies between NLP and social science.
From Craft to Constitution: A Governance-First Paradigm for Principled Agent Engineering
Xu, Qiang, Wen, Xiangyu, Xu, Changran, Li, Zeju, Zhong, Jianyuan
The advent of powerful Large Language Models (LLMs) has ushered in an ``Age of the Agent,'' enabling autonomous systems to tackle complex goals. However, the transition from prototype to production is hindered by a pervasive ``crisis of craft,'' resulting in agents that are brittle, unpredictable, and ultimately untrustworthy in mission-critical applications. This paper argues this crisis stems from a fundamental paradigm mismatch -- attempting to command inherently probabilistic processors with the deterministic mental models of traditional software engineering. To solve this crisis, we introduce a governance-first paradigm for principled agent engineering, embodied in a formal architecture we call ArbiterOS.
ConDABench: Interactive Evaluation of Language Models for Data Analysis
Dutta, Avik, Gupta, Priyanshu, Hasanbeig, Hosein, Singh, Rahul Pratap, Nigam, Harshit, Gulwani, Sumit, Radhakrishna, Arjun, Soares, Gustavo, Tiwari, Ashish
Real-world data analysis tasks often come with under-specified goals and unclean data. User interaction is necessary to understand and disambiguate a user's intent, and hence, essential to solving these complex tasks. Existing benchmarks for evaluating LLMs on data analysis tasks do not capture these complexities or provide first-class support for interactivity. We introduce ConDABench, a framework for generating conversational data analysis (ConDA) benchmarks and evaluating external tools on the generated benchmarks. \bench consists of (a) a multi-agent workflow for generating realistic benchmarks from articles describing insights gained from public datasets, (b) 1,420 ConDA problems generated using this workflow, and (c) an evaluation harness that, for the first time, makes it possible to systematically evaluate conversational data analysis tools on the generated ConDA problems. Evaluation of state-of-the-art LLMs on the benchmarks reveals that while the new generation of models are better at solving more instances, they are not necessarily better at solving tasks that require sustained, long-form engagement. ConDABench is an avenue for model builders to measure progress towards truly collaborative models that can complete complex interactive tasks.
Towards Neurocognitive-Inspired Intelligence: From AI's Structural Mimicry to Human-Like Functional Cognition
Golilarz, Noorbakhsh Amiri, Khatib, Hassan S. Al, Rahimi, Shahram
Artificial intelligence has advanced significantly through deep learning, reinforcement learning, and large language and vision models. However, these systems often remain task specific, struggle to adapt to changing conditions, and cannot generalize in ways similar to human cognition. Additionally, they mainly focus on mimicking brain structures, which often leads to black-box models with limited transparency and adaptability. Inspired by the structure and function of biological cognition, this paper introduces the concept of "Neurocognitive-Inspired Intelligence (NII)," a hybrid approach that combines neuroscience, cognitive science, computer vision, and AI to develop more general, adaptive, and robust intelligent systems capable of rapid learning, learning from less data, and leveraging prior experience. These systems aim to emulate the human brain's ability to flexibly learn, reason, remember, perceive, and act in real-world settings with minimal supervision. We review the limitations of current AI methods, define core principles of neurocognitive-inspired intelligence, and propose a modular, biologically inspired architecture that emphasizes integration, embodiment, and adaptability. We also discuss potential implementation strategies and outline various real-world applications, from robotics to education and healthcare. Importantly, this paper offers a hybrid roadmap for future research, laying the groundwork for building AI systems that more closely resemble human cognition.
A2AS: Agentic AI Runtime Security and Self-Defense
Neelou, Eugene, Novikov, Ivan, Moroz, Max, Narayan, Om, Saade, Tiffany, Ayenson, Mika, Kabanov, Ilya, Ozmen, Jen, Lee, Edward, Narajala, Vineeth Sai, Junior, Emmanuel Guilherme, Huang, Ken, Gulsin, Huseyin, Ross, Jason, Vyshegorodtsev, Marat, Travers, Adelin, Habler, Idan, Jadav, Rahul
A2AS enforces certified behavior, activates model self-defense, and ensures context wi ndow i ntegrity. It defi nes security boundaries, authentic ates prompts, applies security rules and custom policies, and cont rols agentic behavior, enabli ng a defense-i n-depth st rategy. The A2AS framework avoids latency overhe ad, external dependencies, architectural changes, model ret rai ni ng, and operational complexity. The BASIC security model is i nt roduced as the A2AS foundation: (B) Behavior certific ates enable behavior enforcement, (A) Authentic ated prompts enable context wi ndow i ntegrity, (S) S ecurity boundaries enable unt rusted i nput isol ation, (I) I n-context defenses enable secure model re asoni ng, (C) Codified policies enable applic ation-specific rules. The advancements i n Artificial I ntelligence (AI) and its i ntegration across sensitive fields, such as he althc are, fi nance, and critic al i nfrast ructure, have i ncre ased the attack surface of such AI systems. They expose applic ations and data to risks of exfilt ration, i nfection, and mani pulation, potentially compromisi ng confidentiality, i ntegrity, and availability. Movi ng beyond theoretic al risks, a growi ng number of re al-world AI security i ncidents are bei ng reported [1] . The developments i n Large Language Models (LLMs) have i nt roduced a p aradigm shift i n AI engi neeri ng, where buildi ng AI systems is largely centered around i ntegrati ng LLM models. These models have thei r i nherent vulnerabilities that exp and the attack surface and i nt roduce additional security risks.
AI-Agents for Culturally Diverse Online Higher Education Environments
Sun, Fuze, Craig, Paul, Li, Lingyu, Meng, Shixiangyue, Nan, Chuxi
As the global reach of online higher education continues to grow, universities are increasingly accommodating students from diverse cultural backgrounds (Tereshko et al., 2024). This can present a number of challenges including linguistic barriers (Ullah et al., 2021), cultural differences in learning style (Omidvar & Tan, 2012), cultural sensitivity in course design (Nguyen, 2022) and perceived isolation when students feel their perspectives or experiences are not reflected or valued in the learning environment (Hansen-Brown et al., 2022). Ensuring active engagement and reasonable learning outcomes in such a environments requires distance educational systems that are not only adaptive but also culturally resonant (Dalle et al., 2024). Both embodied and virtual AI-Agents have great potential in this regard as they can facilitate personalized learning and adapt their interactions and content delivery to align with students' cultural context. In addition, Generative AI (GAI), such as, Large Language Models (LLMs) can amplify the potential for these culturally aware AI agents to address educational challenges due to their advanced capacity for understanding and generating contextually relevant content (Wang et al., 2024). This chapter reviews existing research and suggests the usage of culturally aware AI-Agents, powered by GAI, to foster engagement and improve learning outcomes in culturally diverse online higher education environments.
RareAgent: Self-Evolving Reasoning for Drug Repurposing in Rare Diseases
Qin, Lang, Gan, Zijian, Cao, Xu, Jiang, Pengcheng, Jiang, Yankai, Han, Jiawei, Wu, Kaishun, Chen, Jintai
Computational drug repurposing for rare diseases is especially challenging when no prior associations exist between drugs and target diseases. Therefore, knowledge graph completion and message-passing GNNs have little reliable signal to learn and propagate, resulting in poor performance. We present RareAgent, a self-evolving multi-agent system that reframes this task from passive pattern recognition to active evidence-seeking reasoning. RareAgent organizes task-specific adversarial debates in which agents dynamically construct evidence graphs from diverse perspectives to support, refute, or entail hypotheses. The reasoning strategies are analyzed post hoc in a self-evolutionary loop, producing textual feedback that refines agent policies, while successful reasoning paths are distilled into transferable heuristics to accelerate future investigations. Comprehensive evaluations reveal that RareAgent improves the indication AUPRC by 18.1% over reasoning baselines and provides a transparent reasoning chain consistent with clinical evidence.
Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents
Miao, Jiacheng, Davis, Joe R., Zhang, Yaohui, Pritchard, Jonathan K., Zou, James
We introduce Paper2Agent, an automated framework that converts research papers into AI agents. Paper2Agent transforms research output from passive artifacts into active systems that can accelerate downstream use, adoption, and discovery. Conventional research papers require readers to invest substantial effort to understand and adapt a paper's code, data, and methods to their own work, creating barriers to dissemination and reuse. Paper2Agent addresses this challenge by automatically converting a paper into an AI agent that acts as a knowledgeable research assistant. It systematically analyzes the paper and the associated codebase using multiple agents to construct a Model Context Protocol (MCP) server, then iteratively generates and runs tests to refine and robustify the resulting MCP. These paper MCPs can then be flexibly connected to a chat agent (e.g. Claude Code) to carry out complex scientific queries through natural language while invoking tools and workflows from the original paper. We demonstrate Paper2Agent's effectiveness in creating reliable and capable paper agents through in-depth case studies. Paper2Agent created an agent that leverages AlphaGenome to interpret genomic variants and agents based on ScanPy and TISSUE to carry out single-cell and spatial transcriptomics analyses. We validate that these paper agents can reproduce the original paper's results and can correctly carry out novel user queries. Paper2Agent automatically created AI co-scientist that identified new splicing variant associated with ADHD risk. By turning static papers into dynamic, interactive AI agents, Paper2Agent introduces a new paradigm for knowledge dissemination and a foundation for the collaborative ecosystem of AI co-scientists.
ABMax: A JAX-based Agent-based Modeling Framework
Chaturvedi, Siddharth, El-Gazzar, Ahmed, van Gerven, Marcel
Agent-based modeling (ABM) is a principal approach for studying complex systems. By decomposing a system into simpler, interacting agents, agent-based modeling (ABM) allows researchers to observe the emergence of complex phenomena. High-performance array computing libraries like JAX can help scale such computational models to a large number of agents by using automatic vectorization and just-in-time (JIT) compilation. One of the caveats of using JAX to achieve such scaling is that the shapes of arrays used in the computational model should remain immutable throughout the simulation. In the context of agent-based modeling (ABM), this can pose constraints on certain agent manipulation operations that require flexible data structures. A subset of which is represented by the ability to update a dynamically selected number of agents by applying distinct changes to them during a simulation. To this effect, we introduce ABMax, an ABM framework based on JAX that implements multiple just-in-time (JIT) compilable algorithms to provide this functionality. On the canonical predation model benchmark, ABMax achieves runtime performance comparable to state-of-the-art implementations. Further, we show that this functionality can also be vectorized, making it possible to run many similar agent-based models in parallel. We also present two examples in the form of a traffic-flow model and a financial market model to show the use case of ABMax