Country
AI can now read emotions – should it?
In its annual report, the AI Now Institute, an interdisciplinary research center studying the societal implications of artificial intelligence, called for a ban on technology designed to recognize people's emotions in certain cases. Specifically, the researchers said affect recognition technology, also called emotion recognition technology, should not be used in decisions that "impact people's lives and access to opportunities," such as hiring decisions or pain assessments, because it is not sufficiently accurate and can lead to biased decisions. What is this technology, which is already being used and marketed, and why is it raising concerns? Researchers have been actively working on computer vision algorithms that can determine the emotions and intent of humans, along with making other inferences, for at least a decade. Facial expression analysis has been around since at least 2003.
Artificial intelligence yields new antibiotic
Using a machine-learning algorithm, MIT researchers have identified a powerful new antibiotic compound. In laboratory tests, the drug killed many of the world's most problematic disease-causing bacteria, including some strains that are resistant to all known antibiotics. It also cleared infections in two different mouse models. The computer model, which can screen more than a hundred million chemical compounds in a matter of days, is designed to pick out potential antibiotics that kill bacteria using different mechanisms than those of existing drugs. "We wanted to develop a platform that would allow us to harness the power of artificial intelligence to usher in a new age of antibiotic drug discovery," says James Collins, the Termeer Professor of Medical Engineering and Science in MIT's Institute for Medical Engineering and Science (IMES) and Department of Biological Engineering.
Neural Lyapunov Model Predictive Control
Mittal, Mayank, Gallieri, Marco, Quaglino, Alessio, Salehian, Seyed Sina Mirrazavi, Koutník, Jan
This paper presents Neural Lyapunov MPC, an algorithm to alternately train a Lyapunov neural network and a stabilising constrained Model Predictive Controller (MPC), given a neural network model of the system dynamics. This extends recent works on Lyapunov networks to be able to train solely from expert demonstrations of one-step transitions. The learned Lyapunov network is used as the value function for the MPC in order to guarantee stability and extend the stable region. Formal results are presented on the existence of a set of MPC parameters, such as discount factors, that guarantees stability with a horizon as short as one. Robustness margins are also discussed and existing performance bounds on value function MPC are extended to the case of imperfect models. The approach is tested on unstable non-linear continuous control tasks with hard constraints. Results demonstrate that, when a neural network trained on short sequences is used for predictions, a one-step horizon Neural Lyapunov MPC can successfully reproduce the expert behaviour and significantly outperform longer horizon MPCs.
Double Explore-then-Commit: Asymptotic Optimality and Beyond
Jin, Tianyuan, Xu, Pan, Xiao, Xiaokui, Gu, Quanquan
We study the two-armed bandit problem with subGaussian rewards. The explore-then-commit (ETC) strategy, which consists of an exploration phase followed by an exploitation phase, is one of the most widely used algorithms in a variety of online decision applications. Nevertheless, it has been shown in Garivier et al. (2016) that ETC is suboptimal in the asymptotic sense as the horizon grows, and thus, is worse than fully sequential strategies such as Upper Confidence Bound (UCB). In this paper, we argue that a variant of ETC algorithm can actually achieve the asymptotically optimal regret bounds for multi-armed bandit problems as UCB-type algorithms do. Specifically, we propose a double explore-then-commit (DETC) algorithm that has two exploration and exploitation phases. We prove that DETC achieves the asymptotically optimal regret bound as the time horizon goes to infinity. To our knowledge, DETC is the first non-fully-sequential algorithm that achieves such asymptotic optimality. In addition, we extend DETC to batched bandit problems, where (i) the exploration process is split into a small number of batches and (ii) the round complexity is of central interest. We prove that a batched version of DETC can achieve the asymptotic optimality with only constant round complexity. This is the first batched bandit algorithm that can attain asymptotic optimality in terms of both regret and round complexity.
Few-Shot Learning via Learning the Representation, Provably
Du, Simon S., Hu, Wei, Kakade, Sham M., Lee, Jason D., Lei, Qi
This paper studies few-shot learning via representation learning, where one uses $T$ source tasks with $n_1$ data per task to learn a representation in order to reduce the sample complexity of a target task for which there is only $n_2 (\ll n_1)$ data. Specifically, we focus on the setting where there exists a good \emph{common representation} between source and target, and our goal is to understand how much of a sample size reduction is possible. First, we study the setting where this common representation is low-dimensional and provide a fast rate of $O\left(\frac{\mathcal{C}\left(\Phi\right)}{n_1T} + \frac{k}{n_2}\right)$; here, $\Phi$ is the representation function class, $\mathcal{C}\left(\Phi\right)$ is its complexity measure, and $k$ is the dimension of the representation. When specialized to linear representation functions, this rate becomes $O\left(\frac{dk}{n_1T} + \frac{k}{n_2}\right)$ where $d (\gg k)$ is the ambient input dimension, which is a substantial improvement over the rate without using representation learning, i.e. over the rate of $O\left(\frac{d}{n_2}\right)$. Second, we consider the setting where the common representation may be high-dimensional but is capacity-constrained (say in norm); here, we again demonstrate the advantage of representation learning in both high-dimensional linear regression and neural network learning. Our results demonstrate representation learning can fully utilize all $n_1T$ samples from source tasks.
Learning to Simulate Complex Physics with Graph Networks
Sanchez-Gonzalez, Alvaro, Godwin, Jonathan, Pfaff, Tobias, Ying, Rex, Leskovec, Jure, Battaglia, Peter W.
Here we present a general framework for learning simulation, and provide a single model implementation that yields state-of-the-art performance across a variety of challenging physical domains, involving fluids, rigid solids, and deformable materials interacting with one another. Our framework---which we term "Graph Network-based Simulators" (GNS)---represents the state of a physical system with particles, expressed as nodes in a graph, and computes dynamics via learned message-passing. Our results show that our model can generalize from single-timestep predictions with thousands of particles during training, to different initial conditions, thousands of timesteps, and at least an order of magnitude more particles at test time. Our model was robust to hyperparameter choices across various evaluation metrics: the main determinants of long-term performance were the number of message-passing steps, and mitigating the accumulation of error by corrupting the training data with noise. Our GNS framework is the most accurate general-purpose learned physics simulator to date, and holds promise for solving a wide range of complex forward and inverse problems.
Hyperbolic Minesweeper is in P
In (a), the default settings are used (bitruncated order-3 heptagonal tessellation, numbers of adjacent mines are color-coded; some of the mines that the players is sure of are marked red). In (b) we play on an order-3 heptagonal tessellation, and numbers are shown. Minesweeper is a popular game included with many computer systems; it also exists in the puzzle form. In the puzzle form, every cell in a square grid either contains a number or is empty.
Split-BOLFI for for misspecification-robust likelihood free inference in high dimensions
Thomas, Owen, Pesonen, Henri, Sá-Leão, Raquel, de Lencastre, Hermínia, Kaski, Samuel, Corander, Jukka
Likelihood-free inference for simulator-based statistical models has recently grown rapidly from its infancy to a useful tool for practitioners. However, models with more than a very small number of parameters as the target of inference have remained an enigma, in particular for the approximate Bayesian computation (ABC) community. To advance the possibilities for performing likelihood-free inference in high-dimensional parameter spaces, here we introduce an extension of the popular Bayesian optimisation based approach to approximate discrepancy functions in a probabilistic manner which lends itself to an efficient exploration of the parameter space. Our method achieves computational scalability by using separate acquisition procedures for the discrepancies defined for different parameters. These efficient high-dimensional simulation acquisitions are combined with exponentiated loss-likelihoods to provide a misspecification-robust characterisation of the marginal posterior distribution for all model parameters. The method successfully performs computationally efficient inference in a 100-dimensional space on canonical examples and compares favourably to existing Copula-ABC methods. We further illustrate the potential of this approach by fitting a bacterial transmission dynamics model to daycare centre data, which provides biologically coherent results on the strain competition in a 30-dimensional parameter space.
Fast local linear regression with anchor regularization
Petrovich, Mathis, Yamada, Makoto
The regression problem is an important problem in machine learning, data mining, and statistics, and several research works have investigated it in the past decades. Examples include stock price prediction [12, 25], age prediction from RNA-seq [6] or images [11], sentimental analysis [15, 18], or house prediction [7] to name a few. The most widely used regression approach is based on a linear model including the ordinary least squares (OLS), Ridge regression, least absolute shrinkage and selection operator (Lasso) [19], and elastic net [26]. Because these linear models are extremely simple and can be interpreted by simply checking the linear coefficients of the variables; these approaches are in particular used in practice. However, one of the limitations of linear models is that they cannot handle complex nonlinear data; the performance can be significantly degraded if we apply these linear methods to process complex data such as the gene expression data used heavily in biology and healthcare. To handle complex data, researchers tend to use kernel methods such as kernel ridge regression (KRR) and support vector regression (SVR) [16].
The Sample Complexity of Meta Sparse Regression
This paper addresses the meta-learning problem in sparse linear regression with infinite tasks. We assume that the learner can access several similar tasks. The goal of the learner is to transfer knowledge from the prior tasks to a similar but novel task. For p parameters, size of the support set k , and l samples per task, we show that T \in O (( k log(p) ) /l ) tasks are sufficient in order to recover the common support of all tasks. With the recovered support, we can greatly reduce the sample complexity for estimating the parameter of the novel task, i.e., l \in O (1) with respect to T and p . We also prove that our rates are minimax optimal. A key difference between meta-learning and the classical multi-task learning, is that meta-learning focuses only on the recovery of the parameters of the novel task, while multi-task learning estimates the parameter of all tasks, which requires l to grow with T . Instead, our efficient meta-learning estimator allows for l to be constant with respect to T (i.e., few-shot learning).