AITopics

2502.14131

Country: Africa > East Africa (0.04)

Genre: Research Report (0.63)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
(2 more...)

arXiv.org Machine LearningMar-3-2025

Online Inference for Quantiles by Constant Learning-Rate Stochastic Gradient Descent

Wei, Ziyang, Li, Jiaqi, Chen, Likai, Wu, Wei Biao

This paper proposes an online inference method of the stochastic gradient descent (SGD) with a constant learning rate for quantile loss functions with theoretical guarantees. Since the quantile loss function is neither smooth nor strongly convex, we view such SGD iterates as an irreducible and positive recurrent Markov chain. By leveraging this interpretation, we show the existence of a unique asymptotic stationary distribution, regardless of the arbitrarily fixed initialization. To characterize the exact form of this limiting distribution, we derive bounds for its moment generating function and tail probabilities, controlling over the first and second moments of SGD iterates. By these techniques, we prove that the stationary distribution converges to a Gaussian distribution as the constant learning rate $\eta\rightarrow0$. Our findings provide the first central limit theorem (CLT)-type theoretical guarantees for the last iterate of constant learning-rate SGD in non-smooth and non-strongly convex settings. We further propose a recursive algorithm to construct confidence intervals of SGD iterates in an online manner. Numerical studies demonstrate strong finite-sample performance of our proposed quantile estimator and inference method. The theoretical tools in this study are of independent interest to investigate general transition kernels in Markov chains.

markov chain, stationary distribution, statistics, (15 more...)

arXiv.org Machine Learning

2503.02178

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Singapore (0.04)
(4 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.57)

Zhang, Yuheng, Jiang, Nan

Statistical Tractability of Off-policy Evaluation of History-dependent Policies in POMDPs

arXiv.org Artificial IntelligenceMar-2-2025

We investigate off-policy evaluation (OPE), a central and fundamental problem in reinforcement learning (RL), in the challenging setting of Partially Observable Markov Decision Processes (POMDPs) with large observation spaces. Recent works of Uehara et al. (2023a); Zhang & Jiang (2024) developed a model-free framework and identified important coverage assumptions (called belief and outcome coverage) that enable accurate OPE of memoryless policies with polynomial sample complexities, but handling more general target policies that depend on the entire observable history remained an open problem. In this work, we prove information-theoretic hardness for model-free OPE of history-dependent policies in several settings, characterized by additional assumptions imposed on the behavior policy (memoryless vs. history-dependent) and/or the state-revealing property of the POMDP (single-step vs. multi-step revealing). We further show that some hardness can be circumvented by a natural model-based algorithm -- whose analysis has surprisingly eluded the literature despite the algorithm's simplicity -- demonstrating provable separation between model-free and model-based OPE in POMDPs.

algorithm, jiang, zhang & jiang, (15 more...)

2503.01134

Country:

North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre:

Research Report (1.00)
Workflow (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Shervedani, Afagh Mehri, Walter, Matthew R., Zefran, Milos

From Vague Instructions to Task Plans: A Feedback-Driven HRC Task Planning Framework based on LLMs

arXiv.org Artificial IntelligenceMar-2-2025

-- Recent advances in large language models (LLMs) have demonstrated their potential as planners in human-robot collaboration (HRC) scenarios, offering a promising alternative to traditional planning methods. LLMs, which can generate structured plans by reasoning over natural language inputs, have the ability to generalize across diverse tasks and adapt to human instructions. This paper investigates the potential of LLMs to facilitate planning in the context of human-robot collaborative tasks, with a focus on their ability to reason from high-level, vague human inputs, and fine-tune plans based on real-time feedback. We propose a novel hybrid framework that combines LLMs with human feedback to create dynamic, context-aware task plans. Our work also highlights how a single, concise prompt can be used for a wide range of tasks and environments, overcoming the limitations of long, detailed structured prompts typically used in prior studies. By integrating user preferences into the planning loop, we ensure that the generated plans are not only effective but aligned with human intentions. Planning is a fundamental aspect of robotics, enabling autonomous agents to generate sequences of actions to achieve specific goals. Traditional planning methods in the context of human-robot collaboration (HRC) and assistive robots can be broadly categorized into two main types: rule-based planners and learning-based planners . Rule-based planners rely on predefined heuristics and symbolic representations, making them interpretable at the expense of not being able to adapt to complex or dynamic environments. In contrast, learning-based planners, particularly those utilizing deep reinforcement learning, learn to generate plans from experience in an adaptive manner.

execution, llm, refinement, (16 more...)

2503.01007

Country: North America > United States > Illinois > Cook County > Chicago (0.05)

Genre:

Research Report (1.00)
Overview (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Ajdarów, Michal, Main, James C. A., Novotný, Petr, Randour, Mickael

Taming Infinity one Chunk at a Time: Concisely Represented Strategies in One-Counter MDPs

arXiv.org Artificial IntelligenceMar-2-2025

Markov decision processes (MDPs) are a canonical model to reason about decision making within a stochastic environment. We study a fundamental class of infinite MDPs: one-counter MDPs (OC-MDPs). They extend finite MDPs via an associated counter taking natural values, thus inducing an infinite MDP over the set of configurations (current state and counter value). We consider two characteristic objectives: reaching a target state (state-reachability), and reaching a target state with counter value zero (selective termination). The synthesis problem for the latter is not known to be decidable and connected to major open problems in number theory. Furthermore, even seemingly simple strategies (e.g., memoryless ones) in OC-MDPs might be impossible to build in practice (due to the underlying infinite configuration space): we need finite, and preferably small, representations. To overcome these obstacles, we introduce two natural classes of concisely represented strategies based on a (possibly infinite) partition of counter values in intervals. For both classes, and both objectives, we study the verification problem (does a given strategy ensure a high enough probability for the objective?), and two synthesis problems (does there exist such a strategy?): one where the interval partition is fixed as input, and one where it is only parameterized. We develop a generic approach based on a compression of the induced infinite MDP that yields decidability in all cases, with all complexities within PSPACE.

configuration, markov chain, probability, (15 more...)

2503.00788

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Czechia (0.04)
(16 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

AIHubMar-1-2025, 16:05:48 GMT

Congratulations to the #AAAI2025 outstanding paper award winners

The AAAI 2025 outstanding paper awards were announced during the opening ceremony of the 39th Annual AAAI Conference on Artificial Intelligence on Thursday 27 February. Papers are recommended for consideration during the review process by members of the Program Committee. This year, three papers have been selected as outstanding papers, with a further paper being recognised in the special track on AI for social impact. Abstract: A fundamental task in multi-agent systems is to match agents to alternatives (e.g., resources or tasks). Often, this is accomplished by eliciting agents' ordinal rankings over the alternatives instead of their exact numerical utilities.

agent, artificial intelligence, machine learning, (15 more...)

AIHub

Country: North America (0.05)

Genre: Personal > Honors > Award (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)

Quantum Geometry insights in Deep Learning

Combe, Noémie C.

In this paper, we explore the fundamental role of the Monge-Amp\`ere equation in deep learning, particularly in the context of Boltzmann machines and energy-based models. We first review the structure of Boltzmann learning and its relation to free energy minimization. We then establish a connection between optimal transport theory and deep learning, demonstrating how the Monge-Amp\`ere equation governs probability transformations in generative models. Additionally, we provide insights from quantum geometry, showing that the space of covariance matrices arising in the learning process coincides with the Connes-Araki-Haagerup (CAH) cone in von Neumann algebra theory. Furthermore, we introduce an alternative approach based on renormalization group (RG) flow, which, while distinct from the optimal transport perspective, reveals another manifestation of the Monge-Amp\`ere domain in learning dynamics. This dual perspective offers a deeper mathematical understanding of hierarchical feature learning, bridging concepts from statistical mechanics, quantum geometry, and deep learning theory.

boltzmann machine, matrix, probability distribution, (12 more...)

2503.02655

Country:

Europe > Poland > Masovia Province > Warsaw (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (0.50)
Overview (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)

CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments

Lei, Mingcong, Wang, Ge, Zhao, Yiming, Mai, Zhixin, Zhao, Qing, Guo, Yao, Li, Zhen, Cui, Shuguang, Han, Yatong, Ren, Jinke

Large Language Models (LLMs) exhibit remarkable capabilities in the hierarchical decomposition of complex tasks through semantic reasoning. However, their application in embodied systems faces challenges in ensuring reliable execution of subtask sequences and achieving one-shot success in long-term task completion. To address these limitations in dynamic environments, we propose Closed-Loop Embodied Agent (CLEA) -- a novel architecture incorporating four specialized open-source LLMs with functional decoupling for closed-loop task management. The framework features two core innovations: (1) Interactive task planner that dynamically generates executable subtasks based on the environmental memory, and (2) Multimodal execution critic employing an evaluation framework to conduct a probabilistic assessment of action feasibility, triggering hierarchical re-planning mechanisms when environmental perturbations exceed preset thresholds. To validate CLEA's effectiveness, we conduct experiments in a real environment with manipulable objects, using two heterogeneous robots for object search, manipulation, and search-manipulation integration tasks. Across 12 task trials, CLEA outperforms the baseline model, achieving a 67.3% improvement in success rate and a 52.8% increase in task completion rate. These results demonstrate that CLEA significantly enhances the robustness of task planning and execution in dynamic environments.

arxiv preprint arxiv, clea, robot, (13 more...)

2503.00729

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities > Geothermal System for Power Generation > Advanced Geothermal System (AGS) (0.84)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Interact, Instruct to Improve: A LLM-Driven Parallel Actor-Reasoner Framework for Enhancing Autonomous Vehicle Interactions

Fang, Shiyu, Liu, Jiaqi, Xu, Chengkai, Lv, Chen, Hang, Peng, Sun, Jian

Autonomous Vehicles (AVs) have entered the commercialization stage, but their limited ability to interact and express intentions still poses challenges in interactions with Human-driven Vehicles (HVs). Recent advances in large language models (LLMs) enable bidirectional human-machine communication, but the conflict between slow inference speed and the need for real-time decision-making challenges practical deployment. To address these issues, this paper introduces a parallel Actor-Reasoner framework designed to enable explicit bidirectional AV-HV interactions across multiple scenarios. First, by facilitating interactions between the LLM-driven Reasoner and heterogeneous simulated HVs during training, an interaction memory database, referred to as the Actor, is established. Then, by introducing the memory partition module and the two-layer memory retrieval module, the Actor's ability to handle heterogeneous HVs is significantly enhanced. Ablation studies and comparisons with other decision-making methods demonstrate that the proposed Actor-Reasoner framework significantly improves safety and efficiency. Finally, with the combination of the external Human-Machine Interface (eHMI) information derived from Reasoner's reasoning and the feasible action solutions retrieved from the Actor, the effectiveness of the proposed Actor-Reasoner is confirmed in multi-scenario field interactions. Our code is available at https://github.com/FanGShiYuu/Actor-Reasoner.

interaction, reasoner, scenario, (15 more...)

2503.00502

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Beijing > Beijing (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(4 more...)

Genre: Research Report (0.65)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Health & Medicine (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Nucleolus Credit Assignment for Effective Coalitions in Multi-agent Reinforcement Learning

Li, Yugu, Cao, Zehong, Qiao, Jianglin, Hu, Siyi

In cooperative multi-agent reinforcement learning (MARL), agents typically form a single grand coalition based on credit assignment to tackle a composite task, often resulting in suboptimal performance. This paper proposed a nucleolus-based credit assignment grounded in cooperative game theory, enabling the autonomous partitioning of agents into multiple small coalitions that can effectively identify and complete subtasks within a larger composite task. Specifically, our designed nucleolus Q-learning could assign fair credits to each agent, and the nucleolus Q-operator provides theoretical guarantees with interpretability for both learning convergence and the stability of the formed small coalitions. Through experiments on Predator-Prey and StarCraft scenarios across varying difficulty levels, our approach demonstrated the emergence of multiple effective coalitions during MARL training, leading to faster learning and superior performance in terms of win rate and cumulative rewards especially in hard and super-hard environments, compared to four baseline methods. Our nucleolus-based credit assignment showed the promise for complex composite tasks requiring effective subteams of agents.

agent, coalition, q-value, (14 more...)

2503.00372

Country:

Oceania > Australia > South Australia > Adelaide (0.05)
North America > United States > Michigan > Wayne County > Detroit (0.04)
North America > United States > Texas (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)