Goto

Collaborating Authors

 Search


Know your Enemy: Investigating Monte-Carlo Tree Search with Opponent Models in Pommerman

arXiv.org Artificial Intelligence

In combination with Reinforcement Learning, Monte-Carlo Tree Search has shown to outperform human grandmasters in games such as Chess, Shogi and Go with little to no prior domain knowledge. However, most classical use cases only feature up to two players. Scaling the search to an arbitrary number of players presents a computational challenge, especially if decisions have to be planned over a longer time horizon. In this work, we investigate techniques that transform general-sum multiplayer games into single-player and two-player games that consider other agents to act according to given opponent models. For our evaluation, we focus on the challenging Pommerman environment which involves partial observability, a long time horizon and sparse rewards. In combination with our search methods, we investigate the phenomena of opponent modeling using heuristics and self-play. Overall, we demonstrate the effectiveness of our multiplayer search variants both in a supervised learning and reinforcement learning setting.


NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis based on Frequency Modulation

arXiv.org Artificial Intelligence

Developing digital sound synthesizers is crucial to the music industry as it provides a low-cost way to produce high-quality sounds with rich timbres. Existing traditional synthesizers often require substantial expertise to determine the overall framework of a synthesizer and the parameters of submodules. Since expert knowledge is hard to acquire, it hinders the flexibility to quickly design and tune digital synthesizers for diverse sounds. In this paper, we propose ``NAS-FM'', which adopts neural architecture search (NAS) to build a differentiable frequency modulation (FM) synthesizer. Tunable synthesizers with interpretable controls can be developed automatically from sounds without any prior expert knowledge and manual operating costs. In detail, we train a supernet with a specifically designed search space, including predicting the envelopes of carriers and modulators with different frequency ratios. An evolutionary search algorithm with adaptive oscillator size is then developed to find the optimal relationship between oscillators and the frequency ratio of FM. Extensive experiments on recordings of different instrument sounds show that our algorithm can build a synthesizer fully automatically, achieving better results than handcrafted synthesizers. Audio samples are available at https://nas-fm.github.io/.


Interactive Natural Language Processing

arXiv.org Artificial Intelligence

Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP, aimed at addressing limitations in existing frameworks while aligning with the ultimate goals of artificial intelligence. This paradigm considers language models as agents capable of observing, acting, and receiving feedback iteratively from external entities. Specifically, language models in this context can: (1) interact with humans for better understanding and addressing user needs, personalizing responses, aligning with human values, and improving the overall user experience; (2) interact with knowledge bases for enriching language representations with factual knowledge, enhancing the contextual relevance of responses, and dynamically leveraging external information to generate more accurate and informed responses; (3) interact with models and tools for effectively decomposing and addressing complex tasks, leveraging specialized expertise for specific subtasks, and fostering the simulation of social behaviors; and (4) interact with environments for learning grounded representations of language, and effectively tackling embodied tasks such as reasoning, planning, and decision-making in response to environmental observations. This paper offers a comprehensive survey of iNLP, starting by proposing a unified definition and framework of the concept. We then provide a systematic classification of iNLP, dissecting its various components, including interactive objects, interaction interfaces, and interaction methods. We proceed to delve into the evaluation methodologies used in the field, explore its diverse applications, scrutinize its ethical and safety issues, and discuss prospective research directions. This survey serves as an entry point for researchers who are interested in this rapidly evolving area and offers a broad view of the current landscape and future trajectory of iNLP.


Further Decimating the Inductive Programming Search Space with Instruction Digrams

arXiv.org Artificial Intelligence

Overlapping instruction subsets derived from human originated code have previously been shown to dramatically shrink the inductive programming search space, often by many orders of magnitude. Here we extend the instruction subset approach to consider direct instruction-instruction applications (or instruction digrams) as an additional search heuristic for inductive programming. In this study we analyse the frequency distribution of instruction digrams in a large sample of open source code. This indicates that the instruction digram distribution is highly skewed with over 93% of possible instruction digrams not represnted in the code sample. We demonstrate that instruction digrams can be used to constrain instruction selection during search, further reducing size of the the search space, in some cases by several orders of magnitude. This significantly increases the size of programs that can be generated using search based inductive programming techniques. We discuss the results and provide some suggestions for further work.


A network community detection method with integration of data from multiple layers and node attributes

arXiv.org Machine Learning

Multilayer networks are in the focus of the current complex network study. In such networks multiple types of links may exist as well as many attributes for nodes. To fully use multilayer -- and other types of complex networks in applications, the merging of various data with topological information renders a powerful analysis. First, we suggest a simple way of representing network data in a data matrix where rows correspond to the nodes, and columns correspond to the data items. The number of columns is allowed to be arbitrary, so that the data matrix can be easily expanded by adding columns. The data matrix can be chosen according to targets of the analysis, and may vary a lot from case to case. Next, we partition the rows of the data matrix into communities using a method which allows maximal compression of the data matrix. For compressing a data matrix, we suggest to extend so called regular decomposition method for non-square matrices. We illustrate our method for several types of data matrices, in particular, distance matrices, and matrices obtained by augmenting a distance matrix by a column of node degrees, or by concatenating several distances matrices corresponding to layers of a multilayer network. We illustrate our method with synthetic power-law graphs and two real networks: an Internet autonomous systems graph and a world airline graph. We compare the outputs of different community recovery methods on these graphs, and discuss how incorporating node degrees as a separate column to the data matrix leads our method to identify community structures well-aligned with tiered hierarchical structures commonly encountered in complex scale-free networks.


Pre-trained Mixed Integer Optimization through Multi-variable Cardinality Branching

arXiv.org Artificial Intelligence

We propose a new method to accelerate online Mixed Integer Optimization with Pre-trained machine learning models (PreMIO). The key component of PreMIO is a multi-variable cardinality branching procedure that splits the feasible region with data-driven hyperplanes, which can be easily integrated into any MIP solver with two lines of code. Moreover, we incorporate learning theory and concentration inequalities to develop a straightforward and interpretable hyper-parameter selection strategy for our method. We test the performance of PreMIO by applying it to state-of-the-art MIP solvers and running numerical experiments on both classical OR benchmark datasets and real-life instances. The results validate the effectiveness of our proposed method.


Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes

arXiv.org Artificial Intelligence

We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight. We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver. We prove an $\tilde{O}(\sqrt{\mathsf{Var}^\star M \Gamma S A K})$ regret bound where $\tilde{O}$ hides logarithm factors, $M$ is the number of contexts, $S$ is the number of states, $A$ is the number of actions, $K$ is the number of episodes, $\Gamma \le S$ is the maximum transition degree of any state-action pair, and $\mathsf{Var}^\star$ is a variance quantity describing the determinism of the LMDP. The regret bound only scales logarithmically with the planning horizon, thus yielding the first (nearly) horizon-free regret bound for LMDP. This is also the first problem-dependent regret bound for LMDP. Key in our proof is an analysis of the total variance of alpha vectors (a generalization of value functions), which is handled with a truncation method. We complement our positive result with a novel $\Omega(\sqrt{\mathsf{Var}^\star M S A K})$ regret lower bound with $\Gamma = 2$, which shows our upper bound minimax optimal when $\Gamma$ is a constant for the class of variance-bounded LMDPs. Our lower bound relies on new constructions of hard instances and an argument inspired by the symmetrization technique from theoretical computer science, both of which are technically different from existing lower bound proof for MDPs, and thus can be of independent interest.


Autoregressive Modeling with Lookahead Attention

arXiv.org Artificial Intelligence

To predict the next token, autoregressive models However, those NP-hard distributions are artificial. For naturally ordinarily examine the past. Could they also benefit occurring sequences, why might one expect lookahead from also examining hypothetical futures? We to help autoregressive modeling? We argue that when the consider a novel Transformer-based autoregressive sequences represent an agent's behavior, an autoregressive architecture that estimates the next-token distribution parameterization is not always the simplest description. If by extrapolating multiple continuations the behavior is goal-directed--for example, an agent trying of the past, according to some proposal distribution, to achieve high reward in a Markov Decision Process--then and attending to these extended strings. This the simplest description may include a characterization of architecture draws insights from classical AI systems the agent's environment and goals. Even if the agent explicitly such as board game players: when making consults an autoregressive policy p(action | state) a local decision, a policy may benefit from exploring at each step, that policy is not arbitrary: while it may appear possible future trajectories and analyzing complex, it was shaped by reinforcement learning or them. On multiple tasks including morphological by natural selection so as to achieve high-reward trajectories.


Optimal Low-Rank Matrix Completion: Semidefinite Relaxations and Eigenvector Disjunctions

arXiv.org Artificial Intelligence

Low-rank matrix completion consists of computing a matrix of minimal complexity that recovers a given set of observations as accurately as possible, and has numerous applications such as product recommendation. Unfortunately, existing methods for solving low-rank matrix completion are heuristics that, while highly scalable and often identifying high-quality solutions, do not possess any optimality guarantees. We reexamine matrix completion with an optimality-oriented eye, by reformulating low-rank problems as convex problems over the non-convex set of projection matrices and implementing a disjunctive branch-and-bound scheme that solves them to certifiable optimality. Further, we derive a novel and often tight class of convex relaxations by decomposing a low-rank matrix as a sum of rank-one matrices and incentivizing, via a Shor relaxation, that each two-by-two minor in each rank-one matrix has determinant zero. In numerical experiments, our new convex relaxations decrease the optimality gap by two orders of magnitude compared to existing attempts. Moreover, we showcase the performance of our disjunctive branch-and-bound scheme and demonstrate that it solves matrix completion problems over 150x150 matrices to certifiable optimality in hours, constituting an order of magnitude improvement on the state-of-the-art for certifiably optimal methods.


Alignment of Density Maps in Wasserstein Distance

arXiv.org Machine Learning

In this paper we propose an algorithm for aligning three-dimensional objects when represented as density maps, motivated by applications in cryogenic electron microscopy. The algorithm is based on minimizing the 1-Wasserstein distance between the density maps after a rigid transformation. The induced loss function enjoys a more benign landscape than its Euclidean counterpart and Bayesian optimization is employed for computation. Numerical experiments show improved accuracy and efficiency over existing algorithms on the alignment of real protein molecules. In the context of aligning heterogeneous pairs, we illustrate a potential need for new distance functions.