Country
Confidence Intervals for Policy Evaluation in Adaptive Experiments
Hadad, Vitor, Hirshberg, David A., Zhan, Ruohan, Wager, Stefan, Athey, Susan
Adaptive experiments can result in considerable cost savings in multi-armed trials by enabling analysts to quickly focus on the most promising alternatives. Most existing work on adaptive experiments (which include multi-armed bandits) has focused maximizing the speed at which the analyst can identify the optimal arm and/or minimizing the number of draws from sub-optimal arms. In many scientific settings, however, it is not only of interest to identify the optimal arm, but also to perform a statistical analysis of the data collected from the experiment. Naive approaches to statistical inference with adaptive inference fail because many commonly used statistics (such as sample means or inverse propensity weighting) do not have an asymptotically Gaussian limiting distribution centered on the estimate, and so confidence intervals constructed from these statistics do not have correct coverage. But, as shown in this paper, carefully designed data-adaptive weighting schemes can be used to overcome this issue and restore a relevant central limit theorem, enabling hypothesis testing. We validate the accuracy of the resulting confidence intervals in numerical experiments.
Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates
Negrea, Jeffrey, Haghifam, Mahdi, Dziugaite, Gintare Karolina, Khisti, Ashish, Roy, Daniel M.
Information-Theoretic Generalization Bounds for SGLD via Data-Dependent EstimatesJeffrey Negrea University of Toronto, V ector Institute Mahdi Haghifam University of Toronto, Element AI Gintare Karolina Dziugaite Element AI Ashish Khisti University of Toronto Daniel M. Roy University of Toronto, V ector Institute Abstract In this work, we improve upon the stepwise analysis of noisy iterative learning algorithms initiated by Pensia, Jog, and Loh (2018) and recently extended by Bu, Zou, and V eeravalli (2019). Our main contributions are significantly improved mutual information bounds for Stochastic Gradient Langevin Dynamics via data-dependent estimates. Our approach is based on the variational characterization of mutual information and the use of data-dependent priors that forecast the mini-batch gradient based on a subset of the training samples. Our approach is broadly applicable within the information-theoretic framework of Russo and Zou (2015) and Xu and Raginsky (2017). Our bound can be tied to a measure of flatness of the empirical risk surface. As compared with other bounds that depend on the squared norms of gradients, empirical investigations show that the terms in our bounds are orders of magnitude smaller. 1 Introduction Stochastic subgradient methods, especially stochastic gradient descent (SGD), are at the core of recent advances in deep-learning practice. Despite some progress, developing a precise understanding of generalization error for that class of algorithms remains wide open. Concurrently, there has been steady progress for noisy variants of SGD, such as stochastic gradient Langevin dynamics (SGLD) [13, 25, 33] and its full-batch counterpart, the Langevin algorithm [13]. The introduction of Gaussian noise to the iterates of SGD expands the set of theoretical frameworks that can be brought to bear on the study of generalization. In pioneering work, Raginsky, Rakhlin, and Telgarsky [25] exploit the fact that SGLD approximates Langevin diffusion, a continuous time Markov process, in the small step size limit. One drawback of this and related analyses involving Markov processes is the reliance on mixing. We hypothesize that SGLD is not mixing in practice, so results based upon mixing may not be representative of empirical performance. In recent work, Pensia, Jog, and Loh [23] perform a stepwise analysis of a family of noisy iterative algorithms that includes SGLD and the Langevin algorithm. At the foundation of this work is the framework of Russo and Zou [28] and Xu and Raginsky [34], where mean generalization error is controlled in terms of the mutual information between the dataset and the learned parameters.
Biconditional Generative Adversarial Networks for Multiview Learning with Missing Views
Doinychko, Anastasiia, Amini, Massih-Reza
In this paper, we present a conditional GAN with two generators and a common discriminator for multiview learning problems where observations have two views, but one of them may be missing for some of the training samples. This is for example the case for multilingual collections where documents are not available in all languages. Some studies tackled this problem by assuming the existence of view generation functions to approximately complete the missing views; for example Machine Translation to translate documents into the missing languages. These functions generally require an external resource to be set and their quality has a direct impact on the performance of the learned multiview classifier over the completed training set. Our proposed approach addresses this problem by jointly learning the missing views and the multiview classifier using a tripartite game with two generators and a discriminator. Each of the generators is associated to one of the views and tries to fool the discriminator by generating the other missing view conditionally on the corresponding observed view. The discriminator then tries to identify if for an observation, one of its views is completed by one of the generators or if both views are completed along with its class. Our results on a subset of Reuters RCV1/RCV2 collections show that the discriminator achieves significant classification performance; and that the generators learn the missing views with high quality without the need of any consequent external resource.
Towards Optimal and Efficient Best Arm Identification in Linear Bandits
Zaki, Mohammadi, Mohan, Avinash, Gopalan, Aditya
We give a new algorithm for best arm identification in linearly parameterised bandits in the fixed confidence setting. The algorithm generalises the well-known LUCB algorithm of Kalyanakrishnan et al. (2012) by playing an arm which minimises a suitable notion of geometric overlap of the statistical confidence set for the unknown parameter, and is fully adaptive and computationally efficient as compared to several state-of-the methods. We theoretically analyse the sample complexity of the algorithm for problems with two and three arms, showing optimality in many cases. Numerical results indicate favourable performance over other algorithms with which we compare.
Graph Neural News Recommendation with Long-term and Short-term Interest Modeling
Hu, Linmei, Li, Chen, Shi, Chuan, Yang, Cheng, Shao, Chao
With the information explosion of news articles, personalized news recommendation has become important for users to quickly find news that they are interested in. Existing methods on news recommendation mainly include collaborative filtering methods which rely on direct user-item interactions and content based methods which characterize the content of user reading history. Although these methods have achieved good performances, they still suffer from data sparse problem, since most of them fail to extensively exploit high-order structure information (similar users tend to read similar news articles) in news recommendation systems. In this paper, we propose to build a heterogeneous graph to explicitly model the interactions among users, news and latent topics. The incorporated topic information would help indicate a user's interest and alleviate the sparsity of user-item interactions. Then we take advantage of graph neural networks to learn user and news representations that encode high-order structure information by propagating embeddings over the graph. The learned user embeddings with complete historic user clicks capture the users' long-term interests. We also consider a user's short-term interest using the recent reading history with an attention based LSTM model. Experimental results on real-world datasets show that our proposed model significantly outperforms state-of-the-art methods on news recommendation.
Sparsity through evolutionary pruning prevents neuronal networks from overfitting
Gerum, Richard C., Erpenbeck, Andrรฉ, Krauss, Patrick, Schilling, Achim
Modern Machine learning techniques take advantage of the exponentially rising calculation power in new generation processor units. Thus, the number of parameters which are trained to resolve complex tasks was highly increased over the last decades. However, still the networks fail - in contrast to our brain - to develop general intelligence in the sense of being able to solve several complex tasks with only one network architecture. This could be the case because the brain is not a randomly initialized neural network, which has to be trained by simply investing a lot of calculation power, but has from birth some fixed hierarchical structure. To make progress in decoding the structural basis of biological neural networks we here chose a bottom-up approach, where we evolutionarily trained small neural networks in performing a maze task. This simple maze task requires dynamical decision making with delayed rewards. We were able to show that during the evolutionary optimization random severance of connections lead to better generalization performance of the networks compared to fully connected networks. We conclude that sparsity is a central property of neural networks and should be considered for modern Machine learning approaches.
Multi-domain Dialogue State Tracking as Dynamic Knowledge Graph Enhanced Question Answering
Multi-domain dialogue state tracking (DST) is a critical component for conversational AI systems. The domain ontology (i.e., specification of domains, slots, and values) of a conversational AI system is generally incomplete, making the capability for DST models to generalize to new slots, values, and domains during inference imperative. In this paper, we propose to model multi-domain DST as a question answering problem, referred to as Dialogue State Tracking via Question Answering (DSTQA). Within DSTQA, each turn generates a question asking for the value of a (domain, slot) pair, thus making it naturally extensible to unseen domains, slots, and values. Additionally, we use a dynamically-evolving knowledge graph to explicitly learn relationships between (domain, slot) pairs. Our model has a 5.80% and 12.21% relative improvement over the current state-of-the-art model on MultiWOZ 2.0 and MultiWOZ 2.1 datasets, respectively. Additionally, our model consistently outperforms the state-of-the-art model in domain adaptation settings.
Can Neural Networks Learn Symbolic Rewriting?
Piotrowski, Bartosz, Urban, Josef, Brown, Chad E., Kaliszyk, Cezary
This work investigates if the current neural architectures are adequate for learning symbolic rewriting. Two kinds of data sets are proposed for this research -- one based on automated proofs and the other being a synthetic set of polynomial terms. The experiments with use of the current neural machine translation models are performed and its results are discussed. Ideas for extending this line of research are proposed and its relevance is motivated.
White-Box Target Attack for EEG-Based BCI Regression Problems
Meng, Lubin, Lin, Chin-Teng, Jung, Tzyy-Ring, Wu, Dongrui
Machine learning has achieved great success in many applications, including electroencephalogram (EEG) based brain-computer interfaces (BCIs). Unfortunately, many machine learning models are vulnerable to adversarial examples, which are crafted by adding deliberately designed perturbations to the original inputs. Many adversarial attack approaches for classification problems have been proposed, but few have considered target adversarial attacks for regression problems. This paper proposes two such approaches. More specifically, we consider white-box target attacks for regression problems, where we know all information about the regression model to be attacked, and want to design small perturbations to change the regression output by a pre-determined amount. Experiments on two BCI regression problems verified that both approaches are effective. Moreover, adversarial examples generated from both approaches are also transferable, which means that we can use adversarial examples generated from one known regression model to attack an unknown regression model, i.e., to perform black-box attacks. To our knowledge, this is the first study on adversarial attacks for EEG-based BCI regression problems, which calls for more attention on the security of BCI systems.
A Human-in-the-loop Framework to Construct Context-dependent Mathematical Formulations of Fairness
Yaghini, Mohammad, Heidari, Hoda, Krause, Andreas
Despite the recent surge of interest in designing and guaranteeing mathematical formulations of fairness, virtually all existing notions of algorithmic fairness fail to be adaptable to the intricacies and nuances of the decision-making context at hand. We argue that capturing such factors is an inherently human task, as it requires knowledge of the social background in which machine learning tools impact real people's outcomes and a deep understanding of the ramifications of automated decisions for decision subjects and society. In this work, we present a framework to construct a context-dependent mathematical formulation of fairness utilizing people's judgment of fairness. We utilize the theoretical model of Heidari et al. (2019)---which shows that most existing formulations of algorithmic fairness are special cases of economic models of Equality of Opportunity (EOP)---and present a practical human-in-the-loop approach to pinpoint the fairness notion in the EOP family that best captures people's perception of fairness in the given context. To illustrate our framework, we run human-subject experiments designed to learn the parameters of Heidari et al.'s EOP model (including circumstance, desert, and utility) in a hypothetical recidivism decision-making scenario. Our work takes an initial step toward democratizing the formulation of fairness and utilizing human-judgment to tackle a fundamental shortcoming of automated decision-making systems: that the machine on its own is incapable of understanding and processing the human aspects and social context of its decisions.