Inductive Learning
Data Science Tropes: Cowboys and Sirens
In this figure, the leaf nodes are the scores generated by the model, as inferred from the training examples residing in those leaf nodes. In this decision tree, $123.56 foreign transaction is scored at a risk level of 200, while a $123.57 For a one-penny difference in amount, the gradient is then 70,000 score points/dollar, an output variance that is wildly disproportionate to 1¢ difference in inputs. This example vividly illustrates that with the data science--and business--world's increasing emphasis on the explainability, reliability, and consistency of models used for decisioning, decision trees simply lose a lot of their palatability.
Learning programs by learning from failures
We introduce learning programs by learning from failures. In this approach, an inductive logic programming (ILP) system (the learner) decomposes the learning problem into three separate stages: generate, test, and constrain. In the generate stage, the learner generates a hypothesis (a logic program) that satisfies a set of hypothesis constraints (constraints on the syntactic form of hypotheses). In the test stage, the learner tests the hypothesis against training examples. A hypothesis fails when it does not entail all the positive examples or entails a negative example. If a hypothesis fails, then, in the constrain stage, the learner learns constraints from the failed hypothesis to prune the hypothesis space, i.e. to constrain subsequent hypothesis generation. For instance, if a hypothesis is too general (entails a negative example), the constraints prune generalisations of the hypothesis. If a hypothesis is too specific (does not entail all the positive examples), the constraints prune specialisations of the hypothesis. This loop repeats until (1) the learner finds a hypothesis that entails all the positive and none of the negative examples, or (2) there are no more hypotheses to test. We implement our idea in Popper, an ILP system which combines answer set programming and Prolog. Popper supports infinite domains, reasoning about lists and numbers, learning optimal (textually minimal) programs, and learning recursive programs. Our experimental results on three diverse domains (number theory problems, robot strategies, and list transformations) show that (1) constraints drastically improve learning performance, and (2) Popper can substantially outperform state-of-the-art ILP systems, both in terms of predictive accuracies and learning times.
Construction and Elicitation of a Black Box Model in the Game of Bridge
Ventos, Véronique, Braun, Daniel, Deheeger, Colin, Desmoulins, Jean Pierre, Fantun, Jean Baptiste, Legras, Swann, Rimbaud, Alexis, Rouveirol, Céline, Soldano, Henry
Our goal is to model expert decision processes in Bridge. To do so, we propose a methodology involving human experts, black box decision programs, and relational supervised machine learning systems. The aim is to obtain a global model for this decision process, that is both expressive and has high predictive performance. Following the success of supervised methods of the deep network family, and a growing pressure from society imposing that automated decision processes be made more transparent, a growing number of AI researchers are (re)exploring techniques to interpret, justify, or explain "black box" classifiers (referred to as the Black Box Outcome Explanation Problem [Guidotti et al., 2019]). It is a question of building, a posteriori, explicit models in symbolic languages, most often in the form of rules or deci-Daniel Braun, Colin Deheeger, Jean Pierre Desmoulins, Jean Baptiste Fantun, Swann Legras, Alexis Rimbaud, Céline Rouveirol, Henry Soldano and Véronique Ventos NukkAI, Paris, France Henry Soldano and Céline Rouveirol Université Sorbonne Paris-Nord, L.I.P.N UMR-CNRS 7030 Villetaneuse, France
Yann LeCun and Yoshua Bengio: Self-supervised learning is the key to human-level intelligence
Self-supervised learning could lead to the creation of AI that's more human-like in its reasoning, according to Turing Award winners Yoshua Bengio and Yann LeCun. Bengio, director at the Montreal Institute for Learning Algorithms, and LeCun, Facebook VP and chief AI scientist, spoke candidly about this and other research trends during a session at the International Conference on Learning Representation (ICLR) 2020, which took place online. Supervised learning entails training an AI model on a labeled data set, and LeCun thinks it'll play a diminishing role as self-supervised learning comes into wider use. Instead of relying on annotations, self-supervised learning algorithms generate labels from data by exposing relationships among the data's parts, a step believed to be critical to achieving human-level intelligence. "Most of what we learn as humans and most of what animals learn is in a self-supervised mode, not a reinforcement mode. It's basically observing the world and interacting with it a little bit, mostly by observation in a test-independent way," said LeCun.
The ILASP system for Inductive Learning of Answer Set Programs
Law, Mark, Russo, Alessandra, Broda, Krysia
The goal of Inductive Logic Programming (ILP) is to learn a program that explains a set of examples in the context of some pre-existing background knowledge. Until recently, most research on ILP targeted learning Prolog programs. Our own ILASP system instead learns Answer Set Programs, including normal rules, choice rules and hard and weak constraints. Learning such expressive programs widens the applicability of ILP considerably; for example, enabling preference learning, learning common-sense knowledge, including defaults and exceptions, and learning non-deterministic theories. In this paper, we first give a general overview of ILASP's learning framework and its capabilities. This is followed by a comprehensive summary of the evolution of the ILASP system, presenting the strengths and weaknesses of each version, with a particular emphasis on scalability.
Supervised Contrastive Learning
Khosla, Prannay, Teterwak, Piotr, Wang, Chen, Sarna, Aaron, Tian, Yonglong, Isola, Phillip, Maschinot, Aaron, Liu, Ce, Krishnan, Dilip
Cross entropy is the most widely used loss function for supervised training of image classification models. In this paper, we propose a novel training methodology that consistently outperforms cross entropy on supervised learning tasks across different architectures and data augmentations. We modify the batch contrastive loss, which has recently been shown to be very effective at learning powerful representations in the self-supervised setting. We are thus able to leverage label information more effectively than cross entropy. Clusters of points belonging to the same class are pulled together in embedding space, while simultaneously pushing apart clusters of samples from different classes. In addition to this, we leverage key ingredients such as large batch sizes and normalized embeddings, which have been shown to benefit self-supervised learning. On both ResNet-50 and ResNet-200, we outperform cross entropy by over 1%, setting a new state of the art number of 78.8% among methods that use AutoAugment data augmentation. The loss also shows clear benefits for robustness to natural corruptions on standard benchmarks on both calibration and accuracy. Compared to cross entropy, our supervised contrastive loss is more stable to hyperparameter settings such as optimizers or data augmentations.
What's Next in AI? Self-supervised Learning
Self-supervised learning is one of those recent ML methods that have caused a ripple effect in the data science network, yet have so far been flying under the radar to the extent Entrepreneurs and Fortunes of the world go; the overall population is yet to find out about the idea yet lots of AI society consider it progressive. The paradigm holds immense potential for enterprises too as it can help handle deep learning's most overwhelming issue: data/sample inefficiency and subsequent costly training. Yann LeCun said that if knowledge was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake and reinforcement learning would be the cherry on the cake. We realize how to make the icing and the cherry, however, we don't have a clue how to make the cake." Unsupervised learning won't progress a lot and said there is by all accounts a massive conceptual disconnect with regards to how precisely it should function and that it was the dark issue of ...
A Visual Guide to Self-Labelling Images
In the past year, several methods for self-supervised learning of image representations have been proposed. A recent trend in the methods is using Contrastive Learning (SimCLR, PIRL, MoCo) which have given very promising results. However, as we had seen in our survey on self-supervised learning, there exist many other problem formulations for self-supervised learning. Combine clustering and representation learning together to learn both features and labels simultaneously. A paper Self-Labelling(SeLa) presented at ICLR 2020 by Asano et al. of the Visual Geometry Group(VGG), University of Oxford has a new take on this approach and achieved the state of the art results in various benchmarks.
RHOG: A Refinement-Operator Library for Directed Labeled Graphs
Intuitively, locally finiteness means that the refinement operator is computable, completeness means we can generate, by refinement of a, any element of G related to a given element g 1 by the order relation, and properness means that a refinement operator does not generate elements which are equivalent to the element being refined. When a refinement operator is locally finite, complete and proper, we say that it is ideal. Notice that all the subsumption relations presented above satisfy the reflexive 2 and transitive 3 properties. Therefore, the pair (G,), where G is the set of all DLGs given a set of labels L, and is any of the subsumption relations defined above is a quasi-ordered set. Thus, this opens the door to defining refinement operators for DLGs. Intuitively, a downward refinement operator for DLGs will generate refinements of a given DLG by either adding vertices, edges, or by making some of the labels more specific, thus making the graph more specific. In the following subsections, we will introduce a collection of refinement operators for connected DLGs, and discuss their theoretical properties. A summary of these operators is shown in Table 1, where we show that under the object-identity constraint, all the refinement operators presented in this document are ideal. If we do not impose object-identity, then the operators are locally complete and complete, but not proper.
A negative case analysis of visual grounding methods for VQA
Shrestha, Robik, Kafle, Kushal, Kanan, Christopher
Existing Visual Question Answering (VQA) methods tend to exploit dataset biases and spurious statistical correlations, instead of producing right answers for the right reasons. To address this issue, recent bias mitigation methods for VQA propose to incorporate visual cues (e.g., human attention maps) to better ground the VQA models, showcasing impressive gains. However, we show that the performance improvements are not a result of improved visual grounding, but a regularization effect which prevents over-fitting to linguistic priors. For instance, we find that it is not actually necessary to provide proper, human-based cues; random, insensible cues also result in similar improvements. Based on this observation, we propose a simpler regularization scheme that does not require any external annotations and yet achieves near state-of-the-art performance on VQA-CPv2.