Country
Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement
Eysenbach, Benjamin, Geng, Xinyang, Levine, Sergey, Salakhutdinov, Ruslan
Multi-task reinforcement learning (RL) aims to simultaneously learn policies for solving many tasks. Several prior works have found that relabeling past experience with different reward functions can improve sample efficiency. Relabeling methods typically ask: if, in hindsight, we assume that our experience was optimal for some task, for what task was it optimal? In this paper, we show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks. We use this idea to generalize goal-relabeling techniques from prior work to arbitrary classes of tasks. Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings, including goal-reaching, domains with discrete sets of rewards, and those with linear reward functions.
Language-Independent Tokenisation Rivals Language-Specific Tokenisation for Word Similarity Prediction
Bollegala, Danushka, Kiryo, Ryuichi, Tsujino, Kosuke, Yukawa, Haruki
Language-independent tokenisation (LIT) methods that do not require labelled language resources or lexicons have recently gained popularity because of their applicability in resource-poor languages. Moreover, they compactly represent a language using a fixed size vocabulary and can efficiently handle unseen or rare words. On the other hand, language-specific tokenisation (LST) methods have a long and established history, and are developed using carefully created lexicons and training resources. Unlike subtokens produced by LIT methods, LST methods produce valid morphological subwords. Despite the contrasting tradeoffs between LIT vs. LST methods, their performance on downstream NLP tasks remain unclear. In this paper, we empirically compare the two approaches using semantic similarity measurement as an evaluation task across a diverse set of languages. Our experimental results covering eight languages show that LST consistently outperforms LIT when the vocabulary size is large, but LIT can produce comparable or better results than LST in many languages with comparatively smaller (i.e. less than 100K words) vocabulary sizes, encouraging the use of LIT when language-specific resources are unavailable, incomplete or a smaller model is required. Moreover, we find that smoothed inverse frequency (SIF) to be an accurate method to create word embeddings from subword embeddings for multilingual semantic similarity prediction tasks. Further analysis of the nearest neighbours of tokens show that semantically and syntactically related tokens are closely embedded in subword embedding spaces.
Teaching the Old Dog New Tricks: Supervised Learning with Constraints
Detassis, Fabrizio, Lombardi, Michele, Milano, Michela
Methods for taking into account external knowledge in Machine Learning models have the potential to address outstanding issues in data-driven AI methods, such as improving safety and fairness, and can simplify training in the presence of scarce data. We propose a simple, but effective, method for injecting constraints at training time in supervised learning, based on decomposition and bi-level optimization: a master step is in charge of enforcing the constraints, while a learner step takes care of training the model. The process leads to approximate constraint satisfaction. The method is applicable to any ML approach for which the concept of label (or target) is well defined (most regression and classification scenarios), and allows to reuse existing training algorithms with no modifications. We require no assumption on the constraints, although their properties affect the shape and complexity of the master problem. Convergence guarantees are hard to provide, but we found that the approach performs well on ML tasks with fairness constraints and on classical datasets with synthetic constraints.
FairRec: Two-Sided Fairness for Personalized Recommendations in Two-Sided Platforms
Patro, Gourab K., Biswas, Arpita, Ganguly, Niloy, Gummadi, Krishna P., Chakraborty, Abhijnan
We investigate the problem of fair recommendation in the context of two-sided online platforms, comprising customers on one side and producers on the other. Traditionally, recommendation services in these platforms have focused on maximizing customer satisfaction by tailoring the results according to the personalized preferences of individual customers. However, our investigation reveals that such customer-centric design may lead to unfair distribution of exposure among the producers, which may adversely impact their well-being. On the other hand, a producer-centric design might become unfair to the customers. Thus, we consider fairness issues that span both customers and producers. Our approach involves a novel mapping of the fair recommendation problem to a constrained version of the problem of fairly allocating indivisible goods. Our proposed FairRec algorithm guarantees at least Maximin Share (MMS) of exposure for most of the producers and Envy-Free up to One item (EF1) fairness for every customer. Extensive evaluations over multiple real-world datasets show the effectiveness of FairRec in ensuring two-sided fairness while incurring a marginal loss in the overall recommendation quality.
Injecting Domain Knowledge in Neural Networks: a Controlled Experiment on a Constrained Problem
Silvestri, Mattia, Lombardi, Michele, Milano, Michela
Given enough data, Deep Neural Networks (DNNs) are capable of learning complex input-output relations with high accuracy. In several domains, however, data is scarce or expensive to retrieve, while a substantial amount of expert knowledge is available. It seems reasonable that if we can inject this additional information in the DNN, we could ease the learning process. One such case is that of Constraint Problems, for which declarative approaches exists and pure ML solutions have obtained mixed success. Using a classical constrained problem as a case study, we perform controlled experiments to probe the impact of progressively adding domain and empirical knowledge in the DNN. Our results are very encouraging, showing that (at least in our setup) embedding domain knowledge at training time can have a considerable effect and that a small amount of empirical knowledge is sufficient to obtain practically useful results.
Forming Diverse Teams from Sequentially Arriving People
Ahmed, Faez, Dickerson, John, Fuge, Mark
Collaborative work often benefits from having teams or organizations with heterogeneous members. In this paper, we present a method to form such diverse teams from people arriving sequentially over time. We define a monotone submodular objective function that combines the diversity and quality of a team and propose an algorithm to maximize the objective while satisfying multiple constraints. This allows us to balance both how diverse the team is and how well it can perform the task at hand. Using crowd experiments, we show that, in practice, the algorithm leads to large gains in team diversity. Using simulations, we show how to quantify the additional cost of forming diverse teams and how to address the problem of simultaneously maximizing diversity for several attributes (e.g., country of origin, gender). Our method has applications in collaborative work ranging from team formation, the assignment of workers to teams in crowdsourcing, and reviewer allocation to journal papers arriving sequentially. Our code is publicly accessible for further research.
Deep Reinforcement Learning with Linear Quadratic Regulator Regions
Fernandez, Gabriel I., Togashi, Colin, Hong, Dennis W., Yang, Lin F.
Practitioners often rely on compute-intensive domain randomization to ensure reinforcement learning policies trained in simulation can robustly transfer to the real world. Due to unmodeled nonlinearities in the real system, however, even such simulated policies can still fail to perform stably enough to acquire experience in real environments. In this paper we propose a novel method that guarantees a stable region of attraction for the output of a policy trained in simulation, even for highly nonlinear systems. Our core technique is to use "bias-shifted" neural networks for constructing the controller and training the network in the simulator. The modified neural networks not only capture the nonlinearities of the system but also provably preserve linearity in a certain region of the state space and thus can be tuned to resemble a linear quadratic regulator that is known to be stable for the real system. We have tested our new method by transferring simulated policies for a swing-up inverted pendulum to real systems and demonstrated its efficacy.
Sony patents new motion sensing controller that hints at new PlayStation VR headset
This month, Sony filed a new patent for a virtual reality game controller that will support motion controls and have sensors that can automatically detect the player's finger position when they grip it. The patent doesn't indicate what game console or device it's intended to work with, but says only that it will work with a'home-use game machine' that'detects movement of a user's hand.' The large candy bar-shaped device will have a thin strip on the back that will be used to detect exact finger position. A patent filed in Japan shows a possible new design for Sony's next VR motion controller, with a sensor strip that tracks individual finger movement and position The patent, which was published in Japanese and roughly translated by Google, says the sensors will not just detect finger position but the'bending and stretching of each finger,' suggesting it might let game designers use specific finger movements as a way to control a game. The front of the controller has a traditional joystick surrounded by four face buttons, according to a report on Upload VR.
Incredible footage of 1911 New York City is colorized by artificial intelligence in high resolution
The 1911 video entitled'A Trip Through New York City' has been brought back to life more than a hundred years later by artificial intelligence. Shot by a Swedish film production company, the black and white footage has be restored with neural networks to create a colorized, sharper version of the black and white movie. The eight-minute clip transports viewers back in time to the Statue of Liberty, Battery Park, the New York Harbor and the famous Flatiron Building on Fifth Avenue. YouTuber Denis Shiryaev posted the new video on his site which is now in 4K quality at 60 frames per second. This'upscaled' footage was created using neural network-powered algorithms such as Topaz Labs' Gigapixel AI and DAIN.
Cisco IoT Platform Gains Machine Learning - SDxCentral
Cisco today introduced machine learning capabilities and tighter integration between service providers and vendors in its IoT management platform. Cisco IoT Control Center now includes machine learning models to identify anomalies and resolve problems before they impact IoT services. The feature also enables service providers to alert customers of errant or otherwise unused devices, and therefore implement greater control over connected devices. Cisco's IoT platform is largely the result of its $1.4 billion acquisition of Jasper in 2016, but the effort has grown considerably during the last four years. Cisco has partnerships and IoT management reselling agreements with 52 network operators and claims to be the No. 1 IoT management platform provider for connected cars.