PDF


On Consequentialism and Fairness

arXiv.org Machine Learning

In recent years, computer scientists have increasingly com e to recognize that artificial intelligence (AI) systems have the potential to create harmful consequences. Especially within machine learning, there have been numerous efforts to formally characterize various not ions of fairness and develop algorithms to satisfy these criteria. However, most of this research has proceede d without any nuanced discussion of ethical foundations. Partly as a response, there have been several r ecent calls to think more broadly about the ethical implications of AI (Barabas et al., 2018; Hu and Chen, 2018b; Torresen, 2018; Green, 2019). Among the most prominent approaches to ethics within philos ophy is a highly influential position known as consequentialism. Roughly speaking, the consequentialist believes that out comes are all that matter, and that people should therefore endeavour to act so as to produce the best consequences, based on an impart ial perspective as to what is best . Although there are numerous difficulties with consequentia lism in practice (see §4), it nevertheless provides a clear and principled foundation from which to critiq ue proposals which fall short of its ideals. In this paper, we analyze the literature on fairness within mac hine learning, and show how it largely depends on assumptions which the consequentialist perspective rev eals immediately to be problematic. In particular, we make the following contributions: - We provide an accessible overview of the main ideas of conseq uentialism ( §3), as well as a discussion of its difficulties ( §4), with a special emphasis on computational limitations. 1 - We review the dominant ideas about fairness in the machine le arning literature ( §5), and provide the first critique of these ideas explicitly from the perspectiv e of consequentialism ( §6). - We conclude with a broader discussion of the ethical issues r aised by learning and randomization, highlighting future direction for both AI and consequentia lism ( §7).


Reject Illegal Inputs with Generative Classifier Derived from Any Discriminative Classifier

arXiv.org Machine Learning

Generative classifiers have been shown promising to detect illegal inputs including adversarial examples and out-of-distribution samples. Supervised Deep Infomax~(SDIM) is a scalable end-to-end framework to learn generative classifiers. In this paper, we propose a modification of SDIM termed SDIM-\emph{logit}. Instead of training generative classifier from scratch, SDIM-\emph{logit} first takes as input the logits produced any given discriminative classifier, and generate logit representations; then a generative classifier is derived by imposing statistical constraints on logit representations. SDIM-\emph{logit} could inherit the performance of the discriminative classifier without loss. SDIM-\emph{logit} incurs a negligible number of additional parameters, and can be efficiently trained with base classifiers fixed. We perform \emph{classification with rejection}, where test samples whose class conditionals are smaller than pre-chosen thresholds will be rejected without predictions. Experiments on illegal inputs, including adversarial examples, samples with common corruptions, and out-of-distribution~(OOD) samples show that allowed to reject a portion of test samples, SDIM-\emph{logit} significantly improves the performance on the left test sets.


Thresholds of descending algorithms in inference problems

arXiv.org Machine Learning

We review recent works [1, 2, 3] on analyzing the dynamics of gradient-based algorithms in a prototypical statistical inference problem. Using methods and insights from the physics of glassy systems, these works showed how to understand quantitatively and qualitatively the performance of gradient-based algorithms. Here we review the key results and their interpretation in nontechnical terms accessible to a wide audience of physicists in the context of related works. PACS numbers: 00.00, 20.00, 42.10 Keywords: analysis of algorithms, statistical inference, spin glasses, machine learning.


Reasoning on Knowledge Graphs with Debate Dynamics

arXiv.org Machine Learning

We propose a novel method for automatic reasoning on knowledge graphs based on debate dynamics. The main idea is to frame the task of triple classification as a debate game between two reinforcement learning agents which extract arguments -- paths in the knowledge graph -- with the goal to promote the fact being true (thesis) or the fact being false (antithesis), respectively. Based on these arguments, a binary classifier, called the judge, decides whether the fact is true or false. The two agents can be considered as sparse, adversarial feature generators that present interpretable evidence for either the thesis or the antithesis. In contrast to other black-box methods, the arguments allow users to get an understanding of the decision of the judge. Since the focus of this work is to create an explainable method that maintains a competitive predictive accuracy, we benchmark our method on the triple classification and link prediction task. Thereby, we find that our method outperforms several baselines on the benchmark datasets FB15k-237, WN18RR, and Hetionet. We also conduct a survey and find that the extracted arguments are informative for users.


Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics

arXiv.org Machine Learning

Many real-world control problems involve both discrete decision variables - such as the choice of control modes, gear switching or digital outputs - as well as continuous decision variables - such as velocity setpoints, control gains or analogue outputs. However, when defining the corresponding optimal control or reinforcement learning problem, it is commonly approximated with fully continuous or fully discrete action spaces. These simplifications aim at tailoring the problem to a particular algorithm or solver which may only support one type of action space. Alternatively, expert heuristics are used to remove discrete actions from an otherwise continuous space. In contrast, we propose to treat hybrid problems in their 'native' form by solving them with hybrid reinforcement learning, which optimizes for discrete and continuous actions simultaneously. In our experiments, we first demonstrate that the proposed approach efficiently solves such natively hybrid reinforcement learning problems. We then show, both in simulation and on robotic hardware, the benefits of removing possibly imperfect expert-designed heuristics. Lastly, hybrid reinforcement learning encourages us to rethink problem definitions. We propose reformulating control problems, e.g. by adding meta actions, to improve exploration or reduce mechanical wear and tear.


Inter- and Intra-domain Knowledge Transfer for Related Tasks in Deep Character Recognition

arXiv.org Machine Learning

Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Abstract --Pre-training a deep neural network on the ImageNet dataset is a common practice for training deep learning models, and generally yields improved performance and faster training times. The technique of pre-training on one task and then retraining on a new one is called transfer learning. We perform three sets of experiments with varying levels of similarity between source and target tasks to investigate the behaviour of different types of knowledge transfer . We transfer both parameters and features and analyse their behaviour . Our results demonstrate that no significant advantage is gained by using a transfer learning approach over a traditional machine learning approach for our character recognition tasks. This suggests that using transfer learning does not necessarily presuppose a better performing model in all cases.


Restricting the Flow: Information Bottlenecks for Attribution

arXiv.org Machine Learning

Attribution methods provide insights into the decision-making of machine learning models like artificial neural networks. For a given input sample, they assign a relevance score to each individual input variable, such as the pixels of an image. In this work we adapt the information bottleneck concept for attribution. By adding noise to intermediate feature maps we restrict the flow of information and can quantify (in bits) how much information image regions provide. We compare our method against ten baselines using three different metrics on VGG-16 and ResNet-50, and find that our methods outperform all baselines in five out of six settings. The method's information-theoretic foundation provides an absolute frame of reference for attribution values (bits) and a guarantee that regions scored close to zero are not necessary for the network's decision.


Kernelized Support Tensor Train Machines

arXiv.org Machine Learning

Tensor, a multi-dimensional data structure, has been exploited recently in the machine learning community. Traditional machine learning approaches are vector- or matrix-based, and cannot handle tensorial data directly. In this paper, we propose a tensor train (TT)-based kernel technique for the first time, and apply it to the conventional support vector machine (SVM) for image classification. Specifically, we propose a kernelized support tensor train machine that accepts tensorial input and preserves the intrinsic kernel property. The main contributions are threefold. First, we propose a TT-based feature mapping procedure that maintains the TT structure in the feature space. Second, we demonstrate two ways to construct the TT-based kernel function while considering consistency with the TT inner product and preservation of information. Third, we show that it is possible to apply different kernel functions on different data modes. In principle, our method tensorizes the standard SVM on its input structure and kernel mapping scheme. Extensive experiments are performed on real-world tensor data, which demonstrates the superiority of the proposed scheme under few-sample high-dimensional inputs.


Visual Machine Learning: Insight through Eigenvectors, Chladni patterns and community detection in 2D particulate structures

arXiv.org Machine Learning

Machine learning (ML) is quickly emerging as a powerful tool with diverse applications across an extremely broad spectrum of disciplines and commercial endeavors. Typically, ML is used as a black box that provides little illuminating rationalization of its output. In the current work, we aim to better understand the generic intuition underlying unsupervised ML with a focus on physical systems. The systems that are studied here as test cases comprise of six different 2-dimensional (2-D) particulate systems of different complexities. It is noted that the findings of this study are generic to any unsupervised ML problem and are not restricted to materials systems alone. Three rudimentary unsupervised ML techniques are employed on the adjacency (connectivity) matrix of the six studied systems: (i) using principal eigenvalue and eigenvectors of the adjacency matrix, (ii) spectral decomposition, and (iii) a Potts model based community detection technique in which a modularity function is maximized. We demonstrate that, while solving a completely classical problem, ML technique produces features that are distinctly connected to quantum mechanical solutions. Dissecting these features help us to understand the deep connection between the classical non-linear world and the quantum mechanical linear world through the kaleidoscope of ML technique, which might have far reaching consequences both in the arena of physical sciences and ML.


Using Data Imputation for Signal Separation in High Contrast Imaging

arXiv.org Machine Learning

With existing PSF construction methods, the circumstellar signals (e.g., planets, circumstellar disks) are unavoidably altered by over-fitting and/or self-subtraction, making forward modeling a necessity to recover these signals. We present a forward modeling-free solution to these problems with data imputation using sequential nonnegative matrix factorization (DI-sNMF). DI-sNMF first converts this signal separation problem to a "missing data" problem in statistics by flagging the regions which host circumstellar signals as missing data, then attributes PSF signals to these regions. We mathematically prove it to have negligible alteration to circumstellar signals when the imputation region is relatively small, which thus enables precise measurement for these circumstellar objects. We apply it to simulated point source and circumstellar disk observations to demonstrate its proper recovery of them. We apply it to Gemini Planet Imager (GPI) K 1 -band observations of the debris disk surrounding HR 4796A, finding a tentative trend that the dust is more forward scattering as the wavelength increases. We expect DI-sNMF to be applicable to other general scenarios where the separation of signals is needed. Keywords: techniques: image processing -- protoplanetary disks -- stars: imaging -- stars: individual: HD 38393 -- stars: individual: HR 4796A 1. Introduction High contrast imaging of circumstellar systems in visible and near infrared light offers direct spectroscopic and astro-metric information on the planet or the distribution of the dust in the system.