Goto

Collaborating Authors

 Agents


AERO: A Redirection-Based Optimization Framework Inspired by Judo for Robust Probabilistic Forecasting

arXiv.org Artificial Intelligence

Optimization remains a fundamental pillar of machine learning, yet existing methods often struggle to maintain stability and adaptability in dynamic, non linear systems, especially under uncertainty. We introduce AERO (Adversarial Energy-based Redirection Optimization), a novel framework inspired by the redirection principle in Judo, where external disturbances are leveraged rather than resisted. AERO reimagines optimization as a redirection process guided by 15 interrelated axioms encompassing adversarial correction, energy conservation, and disturbance-aware learning. By projecting gradients, integrating uncertainty driven dynamics, and managing learning energy, AERO offers a principled approach to stable and robust model updates. Applied to probabilistic solar energy forecasting, AERO demonstrates substantial gains in predictive accuracy, reliability, and adaptability, especially in noisy and uncertain environments. Our findings highlight AERO as a compelling new direction in the theoretical and practical landscape of optimization.


LAM SIMULATOR: Advancing Data Generation for Large Action Model Training via Online Exploration and Trajectory Feedback

arXiv.org Artificial Intelligence

Large Action Models (LAMs) for AI Agents offer incredible potential but face challenges due to the need for high-quality training data, especially for multi-steps tasks that involve planning, executing tool calls, and responding to feedback. To address these issues, we present LAM SIMULATOR, a comprehensive framework designed for online exploration of agentic tasks with high-quality feedback. Our framework features a dynamic task query generator, an extensive collection of tools, and an interactive environment where Large Language Model (LLM) Agents can call tools and receive real-time feedback. This setup enables LLM Agents to explore and solve tasks autonomously, facilitating the discovery of multiple approaches to tackle any given task. The resulting action trajectory data are then used to create high-quality training datasets for LAMs. Our experiments on popular agentic benchmarks, ToolBench and CRMArena, highlight the effectiveness of LAM SIMULATOR: models trained with self-generated datasets using our framework achieve significant performance gains, up to a 49.3\% improvement over their original baselines. LAM SIMULATOR requires minimal human input during dataset creation, highlighting LAM SIMULATOR's efficiency and effectiveness in speeding up development of AI agents.


Composable Building Blocks for Controllable and Transparent Interactive AI Systems

arXiv.org Artificial Intelligence

While the increased integration of AI technologies into interactive systems enables them to solve an equally increasing number of tasks, the black box problem of AI models continues to spread throughout the interactive system as a whole. Explainable AI (XAI) techniques can make AI models more accessible by employing post-hoc methods or transitioning to inherently interpretable models. While this makes individual AI models clearer, the overarching system architecture remains opaque. To this end, we propose an approach to represent interactive systems as sequences of structural building blocks, such as AI models and control mechanisms grounded in the literature. These can then be explained through accompanying visual building blocks, such as XAI techniques. The flow and APIs of the structural building blocks form an explicit overview of the system. This serves as a communication basis for both humans and automated agents like LLMs, aligning human and machine interpretability of AI models. We discuss a selection of building blocks and concretize our flow-based approach in an architecture and accompanying prototype interactive system.


Stochastically Dominant Peer Prediction

arXiv.org Artificial Intelligence

Eliciting reliable human feedback is essential for many machine learning tasks, such as learning from noisy labels and aligning AI systems with human preferences. Peer prediction mechanisms incentivize truthful reporting without ground truth verification by scoring agents based on correlations with peers. Traditional mechanisms, which ensure that truth-telling maximizes the expected scores in equilibrium, can elicit honest information while assuming agents' utilities are linear functions of their scores. However, in practice, non-linear payment rules are usually preferred, or agents' utilities are inherently non-linear. We propose stochastically dominant truthfulness (SD-truthfulness) as a stronger guarantee: the score distribution of truth-telling stochastically dominates all other strategies, incentivizing truthful reporting for a wide range of monotone utility functions. Our first observation is that no existing peer prediction mechanism naturally satisfies this criterion without strong assumptions. A simple solution -- rounding scores into binary lotteries -- can enforce SD-truthfulness, but often degrades sensitivity, a key property related to fairness and statistical efficiency. We demonstrate how a more careful application of rounding can better preserve sensitivity. Furthermore, we introduce a new enforced agreement (EA) mechanism that is theoretically guaranteed to be SD-truthful in binary-signal settings under mild assumptions, and empirically achieves the highest sensitivity among all known SD-truthful mechanisms.


Fairly Wired: Towards Leximin-Optimal Division of Electricity

arXiv.org Artificial Intelligence

In many parts of the world - particularly in developing countries - the demand for electricity exceeds the available supply. In such cases, it is impossible to provide electricity to all households simultaneously. This raises a fundamental question: how should electricity be allocated fairly? In this paper, we explore this question through the lens of egalitarianism - a principle that emphasizes equality by prioritizing the welfare of the worst-off households. One natural rule that aligns with this principle is to maximize the egalitarian welfare - the smallest utility across all households. We show that computing such an allocation is NP-hard, even under strong simplifying assumptions. Leximin is a stronger fairness notion that generalizes the egalitarian welfare: it also requires to maximize the smallest utility, but then, subject to that, the second-smallest, then the third, and so on. The hardness results extends directly to leximin as well. Despite this, we present a Fully Polynomial-Time Approximation Scheme (FPTAS) for leximin in the special case where the network connectivity graph is a tree. This means that we can efficiently approximate leximin - and, in particular, the egalitarian welfare - to any desired level of accuracy.


Descriptive History Representations: Learning Representations by Answering Questions

arXiv.org Artificial Intelligence

Effective decision making in partially observable environments requires compressing long interaction histories into informative representations. We introduce Descriptive History Representations (DHRs): sufficient statistics characterized by their capacity to answer relevant questions about past interactions and potential future outcomes. DHRs focus on capturing the information necessary to address task-relevant queries, providing a structured way to summarize a history for optimal control. We propose a multi-agent learning framework, involving representation, decision, and question-asking components, optimized using a joint objective that balances reward maximization with the representation's ability to answer informative questions. This yields representations that capture the salient historical details and predictive structures needed for effective decision making. We validate our approach on user modeling tasks with public movie and shopping datasets, generating interpretable textual user profiles which serve as sufficient statistics for predicting preference-driven behavior of users.


Enhancing Interpretability of Quantum-Assisted Blockchain Clustering via AI Agent-Based Qualitative Analysis

arXiv.org Artificial Intelligence

Blockchain transaction data is inherently high dimensional, noisy, and entangled, posing substantial challenges for traditional clustering algorithms. While quantum enhanced clustering models have demonstrated promising performance gains, their interpretability remains limited, restricting their application in sensitive domains such as financial fraud detection and blockchain governance. To address this gap, we propose a two stage analysis framework that synergistically combines quantitative clustering evaluation with AI Agent assisted qualitative interpretation. In the first stage, we employ classical clustering methods and evaluation metrics including the Silhouette Score, Davies Bouldin Index, and Calinski Harabasz Index to determine the optimal cluster count and baseline partition quality. In the second stage, we integrate an AI Agent to generate human readable, semantic explanations of clustering results, identifying intra cluster characteristics and inter cluster relationships. Our experiments reveal that while fully trained Quantum Neural Networks (QNN) outperform random Quantum Features (QF) in quantitative metrics, the AI Agent further uncovers nuanced differences between these methods, notably exposing the singleton cluster phenomenon in QNN driven models. The consolidated insights from both stages consistently endorse the three cluster configuration, demonstrating the practical value of our hybrid approach. This work advances the interpretability frontier in quantum assisted blockchain analytics and lays the groundwork for future autonomous AI orchestrated clustering frameworks.


Will Agents Replace Us? Perceptions of Autonomous Multi-Agent AI

arXiv.org Artificial Intelligence

Autonomous multi-agent AI systems are poised to transform various industries, particularly software development and knowledge work. Understanding current perceptions among professionals is crucial for anticipating adoption challenges, ethical considerations, and future workforce development. This study analyzes responses from 130 participants to a survey on the capabilities, impact, and governance of AI agents. We explore expected timelines for AI replacing programmers, identify perceived barriers to deployment, and examine beliefs about responsibility when agents make critical decisions. Key findings reveal three distinct clusters of respondents. While the study explored factors associated with current AI agent deployment, the initial logistic regression model did not yield statistically significant predictors, suggesting that deployment decisions are complex and may be influenced by factors not fully captured or that a larger sample is needed. These insights highlight the need for organizations to address compliance concerns (a commonly cited barrier) and establish clear governance frameworks as they integrate autonomous agents into their workflows.


EvoGit: Decentralized Code Evolution via Git-Based Multi-Agent Collaboration

arXiv.org Artificial Intelligence

We introduce EvoGit, a decentralized multi-agent framework for collaborative software development driven by autonomous code evolution. EvoGit deploys a population of independent coding agents, each proposing edits to a shared code-base without centralized coordination, explicit message passing, or shared memory. Instead, all coordination emerges through a Git-based phylogenetic graph that tracks the full version lineage and enables agents to asynchronously read from and write to the evolving code repository. This graph-based structure supports fine-grained branching, implicit concurrency, and scalable agent interaction while preserving a consistent historical record. Human involvement is minimal but strategic: users define high-level goals, periodically review the graph, and provide lightweight feedback to promote promising directions or prune unproductive ones. Experiments demonstrate EvoGit's ability to autonomously produce functional and modular software artifacts across two real-world tasks: (1) building a web application from scratch using modern frameworks, and (2) constructing a meta-level system that evolves its own language-model-guided solver for the bin-packing optimization problem. Our results underscore EvoGit's potential to establish a new paradigm for decentralized, automated, and continual software development. Large language models (LLMs) have significantly advanced the automation of software development, excelling at tasks such as code generation, debugging, and documentation Zhao et al. (2023); Minaee et al. (2024); Y ang et al. (2024a); Team et al. (2023); OpenAI et al. (2024). Building upon this foundation, recent efforts have embedded LLMs into autonomous coding agents Qian et al. (2023); Y ang et al. (2024b); Hong et al. (2024); Islam et al. (2024), enabling the execution of multi-step development workflows through tool usage, memory, and interaction policies Y ao et al. (2023); Xi et al. (2023); Wang et al. (2024). While promising, current frameworks face critical limitations. Most rely on scalar reward signals, ground-truth unit tests, or dense human feedback to supervise agent behavior. These assumptions are ill-suited for open-ended software engineering, where success criteria are emergent and multifaceted. Furthermore, agent collaboration is typically centralized and synchronous, requiring a global orchestrator and shared memory, which restricts scalability and robustness. Critically, the development process often lacks traceability: as code branches diverge and recombine, it becomes increasingly difficult to understand the provenance of design decisions or to reproduce successful outcomes.


STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent Framework

arXiv.org Artificial Intelligence

High-quality math datasets are crucial for advancing the reasoning abilities of large language models (LLMs). However, existing datasets often suffer from three key issues: outdated and insufficient challenging content, neglecting human-like reasoning, and limited reliability due to single-LLM generation. To address these, we introduce STORM-BORN, an ultra-challenging dataset of mathematical derivations sourced from cutting-edge academic papers, which includes dense human-like approximations and heuristic cues. To ensure the reliability and quality, we propose a novel human-in-the-loop, multi-agent data generation framework, integrating reasoning-dense filters, multi-agent collaboration, and human mathematicians' evaluations. We curated a set of 2,000 synthetic samples and deliberately selected the 100 most difficult problems. Even most advanced models like GPT-o1 solved fewer than 5% of them. Fine-tuning on STORM-BORN boosts accuracy by 7.84% (LLaMA3-8B) and 9.12% (Qwen2.5-7B). As AI approaches mathematician-level reasoning, STORM-BORN provides both a high-difficulty benchmark and a human-like reasoning training resource. Our code and dataset are publicly available at https://github.com/lwhere/STORM-BORN.