AITopics

1905.00495

Country: North America > Canada (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)

Cribeiro-Ramallo, Jose, Aich, Agnideep, Kalinke, Florian, Aich, Ashit Baran, Szabó, Zoltán

The Minimax Lower Bound of Kernel Stein Discrepancy Estimation

arXiv.org Machine LearningOct-20-2025

Kernel Stein discrepancies (KSDs) have emerged as a powerful tool for quantifying goodness-of-fit over the last decade, featuring numerous successful applications. To the best of our knowledge, all existing KSD estimators with known rate achieve $\sqrt n$-convergence. In this work, we present two complementary results (with different proof strategies), establishing that the minimax lower bound of KSD estimation is $n^{-1/2}$ and settling the optimality of these estimators. Our first result focuses on KSD estimation on $\mathbb R^d$ with the Langevin-Stein operator; our explicit constant for the Gaussian kernel indicates that the difficulty of KSD estimation may increase exponentially with the dimensionality $d$. Our second result settles the minimax lower bound for KSD estimation on general domains.

artificial intelligence, ksd, machine learning, (17 more...)

arXiv.org Machine Learning

2510.15058

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.81)

ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling

Lin, Jianghao, Shi, Yuanyuan, Peng, Xin, Ding, Renjie, Wang, Hairui, Peng, Yuxuan, Bai, Bizhe, Song, Weixi, Bai, Fengshuo, Chai, Huacan, Zhang, Weinan, Huang, Fei, Wen, Ying

Large language models (LLMs) are increasingly demonstrating strong capabilities as autonomous agents, with function calling serving as a core mechanism for interaction with the environment. Meanwhile, inference scaling has become a cutting-edge technique to enhance LLM performance by allocating more computational resources during the inference process. However, current research on inference scaling primarily focuses on unstructured output generation tasks, leaving its application in structured outputs, like function calling, largely underexplored. To bridge this gap, we propose an inference scaling framework that combines fine-grained beam search with a process reward model, ToolPRM, which scores the internal steps of each single function call. To train ToolPRM, we construct the first fine-grained intra-call process supervision dataset, automatically annotated with function-masking techniques to provide step-level rewards for structured tool-use reasoning. Extensive experiments demonstrate that ToolPRM beats the coarse-grained and outcome reward models in terms of predictive accuracy, indicating its stronger capability in supervising the function calling inference process. Inference scaling technique equipped with ToolPRM also significantly improves the backbone model performance across various function calling tasks and benchmarks. More importantly, we reveal a key principle for applying inference scaling techniques to structured outputs: "explore more but retain less" due to the unrecoverability characteristics of structured function calling generation.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

2510.14703

Country: Asia (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Blair, Carter, Larson, Kate

Generating Fair Consensus Statements with Social Choice on Token-Level MDPs

Current frameworks for consensus statement generation with large language models lack the inherent structure needed to provide provable fairness guarantees when aggregating diverse free-form opinions. We model the task as a multi-objective, token-level Markov Decision Process (MDP), where each objective corresponds to an agent's preference. Token-level rewards for each agent are derived from their policy (e.g., a personalized language model). This approach utilizes the finding that such policies implicitly define optimal Q-functions, providing a principled way to quantify rewards at each generation step without a value function (Rafailov et al., 2024). This MDP formulation creates a formal structure amenable to analysis using principles from social choice theory. We propose two approaches grounded in social choice theory. First, we propose a stochastic generation policy guaranteed to be in the ex-ante core, extending core stability concepts from voting theory to text generation. This policy is derived from an underlying distribution over complete statements that maximizes proportional fairness (Nash Welfare). Second, for generating a single statement, we target the maximization of egalitarian welfare using search algorithms within the MDP framework. Empirically, experiments using language models to instantiate agent policies show that search guided by the egalitarian objective generates consensus statements with improved worst-case agent alignment compared to baseline methods, including the Habermas Machine (Tessler et al., 2024).

large language model, machine learning, natural language, (17 more...)

2510.14106

Country:

North America > United States (0.46)
Europe > United Kingdom (0.28)

Genre: Research Report (0.63)

Industry:

Government > Immigration & Customs (0.93)
Health & Medicine (0.93)
Law (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Merge and Guide: Unifying Model Merging and Guided Decoding for Controllable Multi-Objective Generation

Xie, Guofu, Zhang, Chen, Zhang, Xiao, Shi, Yunsheng, Yao, Ting, Xu, Jun

Adapting to diverse user needs at test time is a key challenge in controllable multi-objective generation. Existing methods are insufficient: merging-based approaches provide indirect, suboptimal control at the parameter level, often disregarding the impacts of multiple objectives. While decoding-based guidance is more direct, it typically requires aggregating logits from multiple expert models, incurring significant space overhead and relying heavily on individual model capacity. To address these issues, we introduce Merge-And-GuidE (MAGE), a two-stage framework that leverages model merging for guided decoding. We first identify a critical compatibility problem between the guidance and base models. In Stage 1, MAGE resolves this by dynamically constructing a more robust base model, merging a series of backbone models that account for multiple objectives. In Stage 2, we merge explicit and implicit value models into a unified guidance proxy, which then steers the decoding of the base model from Stage 1. Our analysis empirically validates Linear Mode Connectivity (LMC) in value models, explores the relationship between model merging and prediction ensembling, and demonstrates the enhanced controllability afforded by our approach. Extensive experiments show that our method outperforms existing approaches, achieving superior controllability, Pareto-optimal performance, and enhanced adaptability.

large language model, machine learning, natural language, (21 more...)

2510.03782

Country:

Asia (0.92)
North America > United States (0.92)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
(2 more...)

ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning

Liang, Yichao, Nguyen, Dat, Yang, Cambridge, Li, Tianyang, Tenenbaum, Joshua B., Rasmussen, Carl Edward, Weller, Adrian, Tavares, Zenna, Silver, Tom, Ellis, Kevin

Long-horizon embodied planning is challenging because the world does not only change through an agent's actions: exogenous processes (e.g., water heating, dominoes cascading) unfold concurrently with the agent's actions. We propose a framework for abstract world models that jointly learns (i) symbolic state representations and (ii) causal processes for both endogenous actions and exogenous mechanisms. Each causal process models the time course of a stochastic cause-effect relation. We learn these world models from limited data via variational Bayesian inference combined with LLM proposals. Across five simulated tabletop robotics environments, the learned models enable fast planning that generalizes to held-out tasks with more objects and more complex goals, outperforming a range of baselines.

artificial intelligence, bayesian inference, robot, (18 more...)

2509.26255

Genre:

Research Report (0.63)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
(2 more...)

Lee, Beomjoon, Nam, Changjoo

A Hierarchical Bin Packing Framework with Dual Manipulators via Heuristic Search and Deep Reinforcement Learning

arXiv.org Artificial IntelligenceOct-16-2025

We address the bin packing problem (BPP), which aims to maximize bin utilization when packing a variety of items. The offline problem, where the complete information about the item set and their sizes is known in advance, is proven to be NP-hard. The semi-online and online variants are even more challenging, as full information about incoming items is unavailable. While existing methods have tackled both 2D and 3D BPPs, the 2D BPP remains underexplored in terms of fully maximizing utilization. We propose a hierarchical approach for solving the 2D online and semi-online BPP by combining deep reinforcement learning (RL) with heuristic search. The heuristic search selects which item to pack or unpack, determines the packing order, and chooses the orientation of each item, while the RL agent decides the precise position within the bin. Our method is capable of handling diverse scenarios, including repacking, varying levels of item information, differing numbers of accessible items, and coordination of dual manipulators. Experimental results demonstrate that our approach achieves near-optimal utilization across various practical scenarios, largely due to its repacking capability. In addition, the algorithm is evaluated in a physics-based simulation environment, where execution time is measured to assess its real-world performance.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2506.01628

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Mazuelas, Santiago, Alvarez, Veronica

Robust Minimax Boosting with Performance Guarantees

arXiv.org Machine LearningOct-16-2025

Boosting methods often achieve excellent classification accuracy, but can experience notable performance degradation in the presence of label noise. Existing robust methods for boosting provide theoretical robustness guarantees for certain types of label noise, and can exhibit only moderate performance degradation. However, previous theoretical results do not account for realistic types of noise and finite training sizes, and existing robust methods can provide unsatisfactory accuracies, even without noise. This paper presents methods for robust minimax boosting (RMBoost) that minimize worst-case error probabilities and are robust to general types of label noise. In addition, we provide finite-sample performance guarantees for RMBoost with respect to the error obtained without noise and with respect to the best possible error (Bayes risk). The experimental results corroborate that RMBoost is not only resilient to label noise but can also provide strong classification accuracy.

artificial intelligence, machine learning, nullnull null, (17 more...)

arXiv.org Machine Learning

2510.13445

Country: North America > United States > Massachusetts > Middlesex County (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.62)

arXiv.org Artificial IntelligenceOct-15-2025

AI Agents for the Dhumbal Card Game: A Comparative Study

Malla, Sahaj Raj

Abstract--This study evaluates Artificial Intelligence (AI) agents for Dhumbal, a culturally significant multiplayer card game with imperfect information, through a systematic comparison of rule-based, search-based, and learning-based strategies. We formalize Dhumbal's mechanics and implement diverse agents, including heuristic approaches (Aggressive, Conservative, Balanced, Opportunistic), search-based methods such as Monte Carlo Tree Search (MCTS) and Information Set Monte Carlo Tree Search (ISMCTS), and reinforcement learning approaches including Deep Q-Network (DQN) and Proximal Policy Optimization (PPO), and a random baseline. Evaluation involves within-category tournaments followed by a cross-category championship. Performance is measured via win rate, economic outcome, Jhyap success, cards discarded per round, risk assessment, and decision efficiency. Statistical significance is assessed using Welch's t-test with Bonferroni correction, effect sizes via Cohen's d, and 95% confidence intervals (CI). Across 1024 simulated rounds, the rule-based Aggressive agent achieves the highest win rate (88.3%, 95% CI: [86.3, 90.3]), outperforming ISMCTS (9.0%) and PPO (1.5%) through effective exploitation of Jhyap declarations. The study contributes a reproducible AI framework, insights into heuristic efficacy under partial information, and open-source code, thereby advancing AI research and supporting digital preservation of cultural games. HUMBAL, also known as Jhyap in Nepal and Y aniv in Israel, is a traditional draw-and-discard card game that combines strategic decision-making, imperfect information, and risk management. It is widely played across South Asia during family gatherings, festivals, and social events, fostering intergenerational bonds and reflecting communal spirit [1]. Played with 2 to 5 players using a standard 52-card deck, the objective is to minimize the total point value of cards in hand.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2510.11736

Country:

Asia > Nepal (0.34)
Asia > Middle East > Israel (0.24)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Briglia, Giovanni, Fabiano, Francesco, Mariani, Stefano

Scaling Multi-Agent Epistemic Planning through GNN-Derived Heuristics

arXiv.org Artificial IntelligenceOct-15-2025

Multi-agent Epistemic Planning (MEP) is an autonomous planning framework for reasoning about both the physical world and the beliefs of agents, with applications in domains where information flow and awareness among agents are critical. The richness of MEP requires states to be represented as Kripke structures, i.e., directed labeled graphs. This representation limits the applicability of existing heuristics, hindering the scalability of epistemic solvers, which must explore an exponential search space without guidance, resulting often in intractability. To address this, we exploit Graph Neural Networks (GNNs) to learn patterns and relational structures within epistemic states, to guide the planning process. GNNs, which naturally capture the graph-like nature of Kripke models, allow us to derive meaningful estimates of state quality -- e.g., the distance from the nearest goal -- by generalizing knowledge obtained from previously solved planning instances. We integrate these predictive heuristics into an epistemic planning pipeline and evaluate them against standard baselines, showing improvements in the scalability of multi-agent epistemic planning.

artificial intelligence, iqm iqr-std, machine learning, (18 more...)

2508.1284

Country:

Europe > Netherlands (0.28)
North America > United States (0.27)
North America > Canada (0.27)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)