Problem Solving
Transformers are Sample-Efficient World Models
Micheli, Vincent, Alonso, Eloi, Fleuret, François
Deep reinforcement learning agents are notoriously sample inefficient, which considerably limits their application to real-world problems. Recently, many model-based methods have been designed to address this issue, with learning in the imagination of a world model being one of the most prominent approaches. However, while virtually unlimited interaction with a simulated environment sounds appealing, the world model has to be accurate over extended periods of time. Motivated by the success of Transformers in sequence modeling tasks, we introduce IRIS, a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer. With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games, setting a new state of the art for methods without lookahead search. To foster future research on Transformers and world models for sample-efficient reinforcement learning, we release our code and models at https://github.com/eloialonso/iris.
Thrill-K Architecture: Towards a Solution to the Problem of Knowledge Based Understanding
Singer, Gadi, Bach, Joscha, Grinberg, Tetiana, Hakim, Nagib, Howard, Phillip, Lal, Vasudev, Rivlin, Zev
While end-to-end learning systems are rapidly gaining capabilities and popularity, the increasing computational demands for deploying such systems, along with a lack of flexibility, adaptability, explainability, reasoning and verification capabilities, require new types of architectures. Here we introduce a classification of hybrid systems which, based on an analysis of human knowledge and intelligence, combines neural learning with various types of knowledge and knowledge sources. We present the Thrill-K architecture as a prototypical solution for integrating instantaneous knowledge, standby knowledge and external knowledge sources in a framework capable of inference, learning and intelligent control.
Words are all you need? Language as an approximation for human similarity judgments
Marjieh, Raja, van Rijn, Pol, Sucholutsky, Ilia, Sumers, Theodore R., Lee, Harin, Griffiths, Thomas L., Jacoby, Nori
Human similarity judgments are a powerful supervision signal for machine learning applications based on techniques such as contrastive learning, information retrieval, and model alignment, but classical methods for collecting human similarity judgments are too expensive to be used at scale. Recent methods propose using pre-trained deep neural networks (DNNs) to approximate human similarity, but pre-trained DNNs may not be available for certain domains (e.g., medical images, low-resource languages) and their performance in approximating human similarity has not been extensively tested. We conducted an evaluation of 611 pre-trained models across three domains -- images, audio, video -- and found that there is a large gap in performance between human similarity judgments and pre-trained DNNs. To address this gap, we propose a new class of similarity approximation methods based on language. To collect the language data required by these new methods, we also developed and validated a novel adaptive tag collection pipeline. We find that our proposed language-based methods are significantly cheaper, in the number of human judgments, than classical methods, but still improve performance over the DNN-based methods. Finally, we also develop `stacked' methods that combine language embeddings with DNN embeddings, and find that these consistently provide the best approximations for human similarity across all three of our modalities. Based on the results of this comprehensive study, we provide a concise guide for researchers interested in collecting or approximating human similarity data. To accompany this guide, we also release all of the similarity and language data, a total of 206,339 human judgments, that we collected in our experiments, along with a detailed breakdown of all modeling results.
Multi-Valued Neural Networks I A Multi-Valued Associative Memory
Maximov, Dmitry, Goncharenko, Vladimir I., Legovich, Yury S.
A new concept of a multi-valued associative memory is introduced, generalizing a similar one in fuzzy neural networks. We expand the results on fuzzy associative memory with thresholds, to the case of a multi-valued one: we introduce the novel concept of such a network without numbers, investigate its properties, and give a learning algorithm in the multi-valued case. We discovered conditions under which it is possible to store given pairs of network variable patterns in such a multi-valued associative memory. In the multi-valued neural network, all variables are not numbers, but elements or subsets of a lattice, i.e., they are all only partially-ordered. Lattice operations are used to build the network output by inputs. In this paper, the lattice is assumed to be Brouwer and determines the implication used, together with other lattice operations, to determine the neural network output. We gave the example of the network use to classify aircraft/spacecraft trajectories.
The notion of role in conceptual modelling
Reynaud, Chantal, Aussenac-Gilles, Nathalie, Tchounikine, Pierre, Trichet, Franckie
First of all, we present how the relationship between problem solving methods and domain models is tackled in different approaches. We concentrate on how they cope with this issue in the knowledge engineering process. Secondly, we introduce several properties which can be used to analyse, characterise and define the notion of role. We evaluate and compare the works exposed previously following these dimensions. This analysis suggests some developments to better exploit the relationship between reasoning and domain knowledge.
Meta-World Conditional Neural Processes
We propose Meta-World Conditional Neural Processes (MW-CNP), a conditional world model generator that leverages sample efficiency and scalability of Conditional Neural Processes to enable an agent to sample from its own "hallucination". We intend to reduce the agent's interaction with the target environment at test time as much as possible. To reduce the number of samples required at test time, we first obtain a latent representation of the transition dynamics from a single rollout from the test environment with hidden parameters. Then, we obtain rollouts for few-shot learning by interacting with the "hallucination" generated by the meta-world model. Using the world model representation from MW-CNP, the meta-RL agent can adapt to an unseen target environment with significantly fewer samples collected from the target environment compared to the baselines. We emphasize that the agent does not have access to the task parameters throughout training and testing, and MW-CNP is trained on offline interaction data logged during meta-training.
Is Differentiable Architecture Search truly a One-Shot Method?
Geiping, Jonas, Lukasik, Jovita, Keuper, Margret, Moeller, Michael
Recent progress in computer vision and related fields has illustrated the importance of suitable neural architecture designs and training schemes He et al. [2015]. Ever deeper and more complex networks show promise, and manual network design is less and less able to explore the desired search spaces. Neural architecture search (NAS) is the task of optimizing the architecture of a neural network automatically without resorting to human selection, scaling to larger search spaces and proposing novel well-performing architectures. NAS, which is an intrinsically discrete problem, has been successfully addressed using black-box optimization approaches such as reinforcement learning Zoph and Le [2017], Zoph et al. [2018] or Bayesian optimization Kandasamy et al. [2018], White et al. [2019], Ru et al. [2020], Lukasik et al. [2021]. However, these approaches are computationally expensive as they require the training of many candidate networks to cover the search space. In contrast, differentiable architecture search (DAS) Liu et al. [2019] proposes a continuous relaxation of the search problem, i.e. all candidate architectures within a given search space of operations and their connectivity are jointly optimized using shared network parameters while the network also learns to weigh these operations. The final architecture can then simply be deduced by selecting the highest weighted operations. This is appealing as practically good architectures are proposed within a single optimization run. However, previous works such as Zela et al. [2020] also indicate that the proposed results are often sub-optimal, especially when the search space is not well chosen.
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Liang, Paul Pu, Zadeh, Amir, Morency, Louis-Philippe
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining three key principles of modality heterogeneity, connections, and interactions that have driven subsequent innovations, and propose a taxonomy of six core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy.
A Community-Aware Framework for Social Influence Maximization
Umrawal, Abhishek K., Quinn, Christopher J., Aggarwal, Vaneet
We consider the problem of Influence Maximization (IM), the task of selecting $k$ seed nodes in a social network such that the expected number of nodes influenced is maximized. We propose a community-aware divide-and-conquer framework that involves (i) learning the inherent community structure of the social network, (ii) generating candidate solutions by solving the influence maximization problem for each community, and (iii) selecting the final set of seed nodes using a novel progressive budgeting scheme. Our experiments on real-world social networks show that the proposed framework outperforms the standard methods in terms of run-time and the heuristic methods in terms of influence. We also study the effect of the community structure on the performance of the proposed framework. Our experiments show that the community structures with higher modularity lead the proposed framework to perform better in terms of run-time and influence.
Foundation Models for Natural Language Processing -- Pre-trained Language Models Integrating Media
Paaß, Gerhard, Giesselbach, Sven
This open access book provides a comprehensive overview of the state of the art in research and applications of Foundation Models and is intended for readers familiar with basic Natural Language Processing (NLP) concepts. Over the recent years, a revolutionary new paradigm has been developed for training models for NLP. These models are first pre-trained on large collections of text documents to acquire general syntactic knowledge and semantic information. Then, they are fine-tuned for specific tasks, which they can often solve with superhuman accuracy. When the models are large enough, they can be instructed by prompts to solve new tasks without any fine-tuning. Moreover, they can be applied to a wide range of different media and problem domains, ranging from image and video processing to robot control learning. Because they provide a blueprint for solving many tasks in artificial intelligence, they have been called Foundation Models. After a brief introduction to basic NLP models the main pre-trained language models BERT, GPT and sequence-to-sequence transformer are described, as well as the concepts of self-attention and context-sensitive embedding. Then, different approaches to improving these models are discussed, such as expanding the pre-training criteria, increasing the length of input texts, or including extra knowledge. An overview of the best-performing models for about twenty application areas is then presented, e.g., question answering, translation, story generation, dialog systems, generating images from text, etc. For each application area, the strengths and weaknesses of current models are discussed, and an outlook on further developments is given. In addition, links are provided to freely available program code. A concluding chapter summarizes the economic opportunities, mitigation of risks, and potential developments of AI.