South America
Student-Initiated Action Advising via Advice Novelty
Ilhan, Ercument, Perez-Liebana, Diego
Action advising is a knowledge exchange mechanism between peers, namely student and teacher, that can help tackle exploration and sample inefficiency problems in deep reinforcement learning. Due to the practical limitations in peer-to-peer communication and the negative implications of over-advising, the peer responsible for initiating these interactions needs to do so only when it's most adequate to exchange advice. Most recently, student-initiated techniques that utilise state novelty and uncertainty estimations have obtained promising results. However, these estimations have several weaknesses, such as having no information regarding the characteristics of convergence and being subject to delays that occur in the presence of experience replay dynamics. We propose a student-initiated action advising algorithm that alleviates these shortcomings. Specifically, we employ Random Network Distillation (RND) to measure the novelty of an advice, for the student to determine whether to proceed with the request; furthermore, we perform RND updates only for the advised states to ensure that the student's convergence will not prevent it from utilising the teacher's knowledge at any stage of learning. Experiments in GridWorld and simplified versions of five Atari games show that our approach can perform on par with the state-of-the-art and demonstrate significant advantages in the scenarios where the existing methods are prone to fail.
Stochastic Bayesian Neural Networks
Bayesian neural networks perform variational inference over the weights however calculation of the posterior distribution remains a challenge. Our work builds on variational inference techniques for bayesian neural networks using the original Evidence Lower Bound. In this paper, we present a stochastic bayesian neural network in which we maximize Evidence Lower Bound using a new objective function which we name as Stochastic Evidence Lower Bound. We evaluate our network on 5 publicly available UCI datasets using test RMSE and log likelihood as the evaluation metrics. We demonstrate that our work not only beats the previous state of the art algorithms but is also scalable to larger datasets.
A Direct-Indirect Hybridization Approach to Control-Limited DDP
Mastalli, Carlos, Merkt, Wolfgang, Marti-Saumell, Josep, Sola, Joan, Mansard, Nicolas, Vijayakumar, Sethu
Optimal control is a widely used tool for synthesizing motions and controls for user-defined tasks under physical constraints. A common approach is to formulate it using direct multiple-shooting and then to use off-the-shelf nonlinear programming solvers that can easily handle arbitrary constraints on the controls and states. However, these methods are not fast enough for many robotics applications such as real-time humanoid motor control. Exploiting the sparse structure of optimal control problem, such as in Differential DynamicProgramming (DDP), has proven to significantly boost the computational efficiency, and recent works have been focused on handling arbitrary constraints. Despite that, DDP has been associated with poor numerical convergence, particularly when considering long time horizons. One of the main reasons is due to system instabilities and poor warm-starting (only controls). This paper presents control-limited Feasibility-driven DDP (Box-FDDP), a solver that incorporates a direct-indirect hybridization of the control-limited DDP algorithm. Concretely, the forward and backward passes handle feasibility and control limits. We showcase the impact and importance of our method on a set of challenging optimal control problems against the Box-DDP and squashing-function approach.
Workflow Provenance in the Lifecycle of Scientific Machine Learning
Souza, Renan, Azevedo, Leonardo G., Lourenço, Vítor, Soares, Elton, Thiago, Raphael, Brandão, Rafael, Civitarese, Daniel, Brazil, Emilio Vital, Moreno, Marcio, Valduriez, Patrick, Mattoso, Marta, Cerqueira, Renato, Netto, Marco A. S.
Machine Learning (ML) has been fundamentally transforming several industries and businesses in numerous ways. More recently, it has also been impacting computational science and engineering domains, such as geoscience, climate science, material science, and health science. Scientific ML, i.e., ML applied to these domains, is characterized by the combination of data-driven techniques with domain-specific data and knowledge to obtain models of physical phenomena [1], [2], [3], [4], [5]. Obtaining models in scientific ML works similarly to conducting traditional large-scale computational experiments [6], which involve a team of scientists and engineers that formulate hypotheses, design the experiment and predefine parameters and input datasets, analyze the experiment data, do observations, and calibrate initial assumptions in a cycle until they are satisfied with the results. Scientific ML is naturally large-scale because multiple people collaborate in a project, using their multidisciplinary domain-specific knowledge to design and perform data-intensive tasks to curate (i.e., understand, clean, enrich with observations) datasets and prepare for learning algorithms. They then plan and execute compute-intensive tasks for computational simulations or training ML models affected by the scientific domain's constraints. They utilize specialized scientific software tools running either on their desktops, on cloud clusters (e.g., Docker-based), or large HPC machines.
MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces
Santana, Marlesson R. O., Melo, Luckeciano C., Camargo, Fernando H. F., Brandão, Bruno, Soares, Anderson, Oliveira, Renan M., Caetano, Sandor
Recommender Systems are especially challenging for marketplaces since they must maximize user satisfaction while maintaining the healthiness and fairness of such ecosystems. In this context, we observed a lack of resources to design, train, and evaluate agents that learn by interacting within these environments. For this matter, we propose MARS-Gym, an open-source framework to empower researchers and engineers to quickly build and evaluate Reinforcement Learning agents for recommendations in marketplaces. MARS-Gym addresses the whole development pipeline: data processing, model design and optimization, and multi-sided evaluation. We also provide the implementation of a diverse set of baseline agents, with a metrics-driven analysis of them in the Trivago marketplace dataset, to illustrate how to conduct a holistic assessment using the available metrics of recommendation, off-policy estimation, and fairness. With MARS-Gym, we expect to bridge the gap between academic research and production systems, as well as to facilitate the design of new algorithms and applications.
Rethinking Attention with Performers
Choromanski, Krzysztof, Likhosherstov, Valerii, Dohan, David, Song, Xingyou, Gane, Andreea, Sarlos, Tamas, Hawkins, Peter, Davis, Jared, Mohiuddin, Afroz, Kaiser, Lukasz, Belanger, David, Colwell, Lucy, Weller, Adrian
We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness. To approximate softmax attention-kernels, Performers use a novel Fast Attention Via positive Orthogonal Random features approach (FAVOR+), which may be of independent interest for scalable kernel methods. FAVOR+ can be also used to efficiently model kernelizable attention mechanisms beyond softmax. This representational power is crucial to accurately compare softmax with other kernels for the first time on large-scale tasks, beyond the reach of regular Transformers, and investigate optimal attention-kernels. Performers are linear architectures fully compatible with regular Transformers and with strong theoretical guarantees: unbiased or nearly-unbiased estimation of the attention matrix, uniform convergence and low estimation variance. We tested Performers on a rich set of tasks stretching from pixel-prediction through text models to protein sequence modeling. We demonstrate competitive results with other examined efficient sparse and dense attention methods, showcasing effectiveness of the novel attention-learning paradigm leveraged by Performers.
Carousel Personalization in Music Streaming Apps with Contextual Bandits
Bendada, Walid, Salha, Guillaume, Bontempelli, Théo
Media services providers, such as music streaming platforms, frequently leverage swipeable carousels to recommend personalized content to their users. However, selecting the most relevant items (albums, artists, playlists...) to display in these carousels is a challenging task, as items are numerous and as users have different preferences. In this paper, we model carousel personalization as a contextual multi-armed bandit problem with multiple plays, cascade-based updates and delayed batch feedback. We empirically show the effectiveness of our framework at capturing characteristics of real-world carousels by addressing a large-scale playlist recommendation task on a global music streaming mobile app. Along with this paper, we publicly release industrial data from our experiments, as well as an open-source environment to simulate comparable carousel personalization learning problems.
Multi-Agent Systems based on Contextual Defeasible Logic considering Focus
Monte-Alto, Helio H. L. C., Morveli-Espinoza, Mariela, Tacla, Cesar A.
In this paper, we extend previous work on distributed reasoning using Contextual Defeasible Logic (CDL), which enables decentralised distributed reasoning based on a distributed knowledge base, such that the knowledge from different knowledge bases may conflict with each other. However, there are many use case scenarios that are not possible to represent in this model. One kind of such scenarios are the ones that require that agents share and reason with relevant knowledge when issuing a query to others. Another kind of scenarios are those in which the bindings among the agents (defined by means of mapping rules) are not static, such as in knowledge-intensive and dynamic environments. This work presents a multi-agent model based on CDL that not only allows agents to reason with their local knowledge bases and mapping rules, but also allows agents to reason about relevant knowledge (focus) -- which are not known by the agents a priori -- in the context of a specific query. We present a use case scenario, some formalisations of the model proposed, and an initial implementation based on the BDI (Belief-Desire-Intention) agent model.
Creative Captioning: An AI Grand Challenge Based on the Dixit Board Game
Kunda, Maithilee, Rabkina, Irina
We propose a new class of "grand challenge" AI problems that we call creative captioning---generating clever, interesting, or abstract captions for images, as well as understanding such captions. Creative captioning draws on core AI research areas of vision, natural language processing, narrative reasoning, and social reasoning, and across all these areas, it requires sophisticated uses of common sense and cultural knowledge. In this paper, we analyze several specific research problems that fall under creative captioning, using the popular board game Dixit as both inspiration and proposed testing ground. We expect that Dixit could serve as an engaging and motivating benchmark for creative captioning across numerous AI research communities for the coming 1-2 decades.
Encoding cloth manipulations using a graph of states and transitions
Borràs, Júlia, Alenyà, Guillem, Torras, Carme
Abstract-- Cloth manipulation is very relevant for domestic robotic tasks, but it presents many challenges due to the complexity of representing, recognizing and predicting behaviour of cloth under manipulation. In this work, we propose a generic, compact and simplified representation of the states of cloth manipulation that allows for representing tasks as sequences of states and transitions. We also define a graph of manipulation primitives that encodes all the strategies to accomplish a task. Our novel representation is used to encode the task of folding a napkin, learned from an experiment with human subjects with video and motion data. We show how our simplified representation allows to obtain a map of meaningful motion primitives and to segment the motion data to obtain sets of trajectories, velocity and acceleration profiles corresponding to each manipulation primitive in the graph.