AITopics

2408.13888

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Rosa, Giacomo, Lipovetzky, Nir

Count-based Novelty Exploration in Classical Planning

arXiv.org Artificial IntelligenceAug-25-2024

Count-based exploration methods are widely employed subdivide planning problems into smaller sub-problems through the to improve the exploratory behavior of learning agents over sequential use of partitioning heuristics to control the direction of search and decision problems. Meanwhile, Novelty search has achieved success increase the number of novel nodes. Katz et al. [13] provide a definition in Classical Planning through recording of the first, but not successive, of novelty of a state with respect to its heuristic estimate, providing occurrences of tuples. In order to structure the exploration, multiple novelty measures which quantify the novelty degree of a however, the number of tuples considered needs to grow exponentially state in terms of the number of novel and non-novel state facts. More as the search progresses. We propose a new novelty technique, recently, Singh et al. [27] introduce approximate novelty, which uses classical count-based novelty, which aims to explore the state space an approximate measurement of state novelty which is more time with a constant number of tuples, by leveraging the frequency of each and memory efficient, proving capable of estimating novelty values tuple's appearance in a search tree. We then justify the mechanisms of cardinality greater than 2 in practical scenarios. Relating Novelty through which lower tuple counts lead the search towards novel tuples.

hamming distance, node, solver, (17 more...)

2408.13719

Country:

Europe > France (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
Europe > Spain (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.87)

arXiv.org Artificial IntelligenceAug-25-2024

Flexible game-playing AI with AlphaViT: adapting to multiple games and board sizes

Fujita, Kazuhisa

This paper presents novel game AI agents based on the AlphaZero framework, enhanced with Vision Transformers (ViT): AlphaViT, AlphaViD, and AlphaVDA. These agents are designed to play various board games of different sizes using a single model, overcoming AlphaZero's limitation of being restricted to a fixed board size. AlphaViT uses only a transformer encoder, while AlphaViD and AlphaVDA contain both an encoder and a decoder. AlphaViD's decoder receives input from the encoder output, while AlphaVDA uses a learnable matrix as decoder input. Using the AlphaZero framework, the three proposed methods demonstrate their versatility in different game environments, including Connect4, Gomoku, and Othello. Experimental results show that these agents, whether trained on a single game or on multiple games simultaneously, consistently outperform traditional algorithms such as Minimax and Monte Carlo tree search using a single DNN with shared weights, while approaching the performance of AlphaZero. In particular, AlphaViT and AlphaViD show strong performance across games, with AlphaViD benefiting from an additional decoder layer that enhances its ability to adapt to different action spaces and board sizes. These results may suggest the potential of transformer-based architectures to develop more flexible and robust game AI agents capable of excelling in multiple games and dynamic environments.

alphavit, alphazero, board size, (17 more...)

2408.13871

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > Austria (0.04)
Asia > Japan (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Leisure & Entertainment > Games > Chess (0.49)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.90)

Muvvala, Karan, Ho, Qi Heng, Lahijanian, Morteza

Beyond Winning Strategies: Admissible and Admissible Winning Strategies for Quantitative Reachability Games

arXiv.org Artificial IntelligenceAug-23-2024

Classical reactive synthesis approaches aim to synthesize a reactive system that always satisfies a given specifications. These approaches often reduce to playing a two-player zero-sum game where the goal is to synthesize a winning strategy. However, in many pragmatic domains, such as robotics, a winning strategy does not always exist, yet it is desirable for the system to make an effort to satisfy its requirements instead of "giving up". To this end, this paper investigates the notion of admissible strategies, which formalize "doing-your-best", in quantitative reachability games. We show that, unlike the qualitative case, quantitative admissible strategies are history-dependent even for finite payoff functions, making synthesis a challenging task. In addition, we prove that admissible strategies always exist but may produce undesirable optimistic behaviors. To mitigate this, we propose admissible winning strategies, which enforce the best possible outcome while being admissible. We show that both strategies always exist but are not memoryless. We provide necessary and sufficient conditions for the existence of both strategies and propose synthesis algorithms. Finally, we illustrate the strategies on gridworld and robot manipulator domains.

acval, aval, cval, (14 more...)

2408.13369

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)
Asia > Japan > Honshū > Kansai > Wakayama Prefecture > Wakayama (0.04)

Genre:

Research Report (1.00)
Overview (0.68)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

arXiv.org Artificial IntelligenceAug-23-2024

SUMO: Search-Based Uncertainty Estimation for Model-Based Offline Reinforcement Learning

Qiao, Zhongjian, Lyu, Jiafei, Jiao, Kechen, Liu, Qi, Li, Xiu

The performance of offline reinforcement learning (RL) suffers from the limited size and quality of static datasets. Model-based offline RL addresses this issue by generating synthetic samples through a dynamics model to enhance overall performance. To evaluate the reliability of the generated samples, uncertainty estimation methods are often employed. However, model ensemble, the most commonly used uncertainty estimation method, is not always the best choice. In this paper, we propose a \textbf{S}earch-based \textbf{U}ncertainty estimation method for \textbf{M}odel-based \textbf{O}ffline RL (SUMO) as an alternative. SUMO characterizes the uncertainty of synthetic samples by measuring their cross entropy against the in-distribution dataset samples, and uses an efficient search-based method for implementation. In this way, SUMO can achieve trustworthy uncertainty estimation. We integrate SUMO into several model-based offline RL algorithms including MOPO and Adapted MOReL (AMOReL), and provide theoretical analysis for them. Extensive experimental results on D4RL datasets demonstrate that SUMO can provide more accurate uncertainty estimation and boost the performance of base algorithms. These indicate that SUMO could be a better uncertainty estimator for model-based offline RL when used in either reward penalty or trajectory truncation. Our code is available and will be open-source for further research and development.

dataset, sumo, uncertainty estimation, (17 more...)

2408.1297

Country:

North America > United States (0.06)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Zeroth-Order Stochastic Mirror Descent Algorithms for Minimax Excess Risk Optimization

Gu, Zhihao, Xu, Zi

The minimax excess risk optimization (MERO) problem is a new variation of the traditional distributionally robust optimization (DRO) problem, which achieves uniformly low regret across all test distributions under suitable conditions. In this paper, we propose a zeroth-order stochastic mirror descent (ZO-SMD) algorithm available for both smooth and non-smooth MERO to estimate the minimal risk of each distrbution, and finally solve MERO as (non-)smooth stochastic convex-concave (linear) minimax optimization problems. The proposed algorithm is proved to converge at optimal convergence rates of $\mathcal{O}\left(1/\sqrt{t}\right)$ on the estimate of $R_i^*$ and $\mathcal{O}\left(1/\sqrt{t}\right)$ on the optimization error of both smooth and non-smooth MERO. Numerical results show the efficiency of the proposed algorithm.

algorithm, optimization, zeroth-order stochastic mirror descent algorithm, (9 more...)

2408.12209

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.81)

Bessa, Swann, Dabert, Darius, Bourgeat, Max, Rousseau, Louis-Martin, Cappart, Quentin

Learning Valid Dual Bounds in Constraint Programming: Boosted Lagrangian Decomposition with Self-Supervised Learning

Lagrangian decomposition (LD) is a relaxation method that provides a dual bound for constrained optimization problems by decomposing them into more manageable sub-problems. This bound can be used in branch-and-bound algorithms to prune the search space effectively. In brief, a vector of Lagrangian multipliers is associated with each sub-problem, and an iterative procedure (e.g., a sub-gradient optimization) adjusts these multipliers to find the tightest bound. Initially applied to integer programming, Lagrangian decomposition also had success in constraint programming due to its versatility and the fact that global constraints provide natural sub-problems. However, the non-linear and combinatorial nature of sub-problems in constraint programming makes it computationally intensive to optimize the Lagrangian multipliers with sub-gradient methods at each node of the tree search. This currently limits the practicality of LD as a general bounding mechanism for constraint programming. To address this challenge, we propose a self-supervised learning approach that leverages neural networks to generate multipliers directly, yielding tight bounds. This approach significantly reduces the number of sub-gradient optimization steps required, enhancing the pruning efficiency and reducing the execution time of constraint programming solvers. This contribution is one of the few that leverage learning to enhance bounding mechanisms on the dual side, a critical element in the design of combinatorial solvers. To our knowledge, this work presents the first generic method for learning valid dual bounds in constraint programming.

constraint, constraint programming, multiplier, (15 more...)

2408.12695

Country:

Europe > Austria > Vienna (0.14)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > Virginia > Alexandria County > Alexandria (0.04)
(7 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)

UNCO: Towards Unifying Neural Combinatorial Optimization through Large Language Model

Jiang, Xia, Wu, Yaoxin, Wang, Yuan, Zhang, Yingqian

Recently, applying neural networks to address combinatorial optimization problems (COPs) has attracted considerable research attention. The prevailing methods always train deep models independently on specific problems, lacking a unified framework for concurrently tackling various COPs. To this end, we propose a unified neural combinatorial optimization (UNCO) framework to solve different types of COPs by a single model. Specifically, we use natural language to formulate text-attributed instances for different COPs and encode them in the same embedding space by the large language model (LLM). The obtained embeddings are further advanced by an encoder-decoder model without any problem-specific modules, thereby facilitating a unified process of solution construction. We further adopt the conflict gradients erasing reinforcement learning (CGERL) algorithm to train the UNCO model, delivering better performance across different COPs than vanilla multi-objective learning. Experiments show that the UNCO model can solve multiple COPs after a single-session training, and achieves satisfactory performance that is comparable to several traditional or learning-based baselines. Instead of pursuing the best performance for each COP, we explore the synergy between tasks and few-shot generalization based on LLM to inspire future work.

different cop, llm, optimization, (13 more...)

2408.12214

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.64)

Industry: Transportation (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Deng, Yuyang, Qiao, Fuli, Mahdavi, Mehrdad

Stochastic Compositional Minimax Optimization with Provable Convergence Guarantees

Stochastic compositional minimax problems are prevalent in machine learning, yet there are only limited established on the convergence of this class of problems. In this paper, we propose a formal definition of the stochastic compositional minimax problem, which involves optimizing a minimax loss with a compositional structure either in primal , dual, or both primal and dual variables. We introduce a simple yet effective algorithm, stochastically Corrected stOchastic gradient Descent Ascent (CODA), which is a descent ascent type algorithm with compositional correction steps, and establish its convergence rate in aforementioned three settings. In the presence of the compositional structure in primal, the objective function typically becomes nonconvex in primal due to function composition. Thus, we consider the nonconvex-strongly-concave and nonconvex-concave settings and show that CODA can efficiently converge to a stationary point. In the case of composition on the dual, the objective function becomes nonconcave in the dual variable, and we demonstrate convergence in the strongly-convex-nonconcave and convex-nonconcave setting. In the case of composition on both variables, the primal and dual variables may lose convexity and concavity, respectively. Therefore, we anaylze the convergence in weakly-convex-weakly-concave setting. We also give a variance reduction version algorithm, CODA+, which achieves the best known rate on nonconvex-strongly-concave and nonconvex-concave compositional minimax problem. This work initiates the theoretical study of the stochastic compositional minimax problem on various settings and may inform modern machine learning scenarios such as domain adaptation or robust model-agnostic meta-learning.

algorithm, assumption, minimax problem, (13 more...)

2408.12505

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Lin, Tianyi, Jin, Chi, Jordan, Michael. I.

Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization

arXiv.org Artificial IntelligenceAug-21-2024

We provide a unified analysis of two-timescale gradient descent ascent (TTGDA) for solving structured nonconvex minimax optimization problems in the form of $\min_\textbf{x} \max_{\textbf{y} \in Y} f(\textbf{x}, \textbf{y})$, where the objective function $f(\textbf{x}, \textbf{y})$ is nonconvex in $\textbf{x}$ and concave in $\textbf{y}$, and the constraint set $Y \subseteq \mathbb{R}^n$ is convex and bounded. In the convex-concave setting, the single-timescale GDA achieves strong convergence guarantees and has been used for solving application problems arising from operations research and computer science. However, it can fail to converge in more general settings. Our contribution in this paper is to design the simple deterministic and stochastic TTGDA algorithms that efficiently find one stationary point of the function $\Phi(\cdot) := \max_{\textbf{y} \in Y} f(\cdot, \textbf{y})$. Specifically, we prove the theoretical bounds on the complexity of solving both smooth and nonsmooth nonconvex-concave minimax optimization problems. To our knowledge, this is the first systematic analysis of TTGDA for nonconvex minimax optimization, shedding light on its superior performance in training generative adversarial networks (GANs) and in solving other real-world application problems.

algorithm 1, gradient evaluation, inequality, (13 more...)

2408.11974

Country:

Asia > Middle East > Jordan (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)