Goto

Collaborating Authors

 delta 0


Reviews: Monte-Carlo Tree Search by Best Arm Identification

Neural Information Processing Systems

This work uses best arm identification (BAI) techniques applies to the monte-carlo tree search problem with two-players (in a turn-based setting). The goal is to find the best action for player A to take by carefully considering all the next actions that player B and A can take in the following rounds. Access to a stochastic oracle to evaluate the values of leaves is supposed, hence the goal is to find approximately (with precision \epsilon) the best action at the root with high confidence (at least 1-\delta). Algorithms based on confidence intervals and upwards propagation (from leaf to the root) of the upper (for the MAX nodes, action by player A) and lower (for the MIN nodes, action by player B the opponent) confidence bounds are proposed. The algorithms are intuitive and well described, and well rooted in the fixed confidence BAI literature.


Computational thematics: Comparing algorithms for clustering the genres of literary fiction

arXiv.org Artificial Intelligence

What are the best methods of capturing thematic similarity between literary texts? Knowing the answer to this question would be useful for automatic clustering of book genres, or any other thematic grouping. This paper compares a variety of algorithms for unsupervised learning of thematic similarities between texts, which we call "computational thematics". These algorithms belong to three steps of analysis: text preprocessing, extraction of text features, and measuring distances between the lists of features. Each of these steps includes a variety of options. We test all the possible combinations of these options: every combination of algorithms is given a task to cluster a corpus of books belonging to four pre-tagged genres of fiction. This clustering is then validated against the "ground truth" genre labels. Such comparison of algorithms allows us to learn the best and the worst combinations for computational thematic analysis. To illustrate the sharp difference between the best and the worst methods, we then cluster 5000 random novels from the HathiTrust corpus of fiction.


Self-supervised similarity models based on well-logging data

arXiv.org Artificial Intelligence

Adopting data-based approaches leads to model improvement in numerous Oil&Gas logging data processing problems. These improvements become even more sound due to new capabilities provided by deep learning. However, usage of deep learning is limited to areas where researchers possess large amounts of high-quality data. We present an approach that provides universal data representations suitable for solutions to different problems for different oil fields with little additional data. Our approach relies on the self-supervised methodology for sequential logging data for intervals from well, so it also doesn't require labelled data from the start. For validation purposes of the received representations, we consider classification and clusterization problems. We as well consider the transfer learning scenario. We found out that using the variational autoencoder leads to the most reliable and accurate models. approach We also found that a researcher only needs a tiny separate data set for the target oil field to solve a specific problem on top of universal representations.


Localized Uncertainty Attacks

arXiv.org Machine Learning

The susceptibility of deep learning models to adversarial perturbations has stirred renewed attention in adversarial examples resulting in a number of attacks. However, most of these attacks fail to encompass a large spectrum of adversarial perturbations that are imperceptible to humans. In this paper, we present localized uncertainty attacks, a novel class of threat models against deterministic and stochastic classifiers. Under this threat model, we create adversarial examples by perturbing only regions in the inputs where a classifier is uncertain. To find such regions, we utilize the predictive uncertainty of the classifier when the classifier is stochastic or, we learn a surrogate model to amortize the uncertainty when it is deterministic. Unlike $\ell_p$ ball or functional attacks which perturb inputs indiscriminately, our targeted changes can be less perceptible. When considered under our threat model, these attacks still produce strong adversarial examples; with the examples retaining a greater degree of similarity with the inputs.


What is the relationship between Curse of Dimensionality and isotropic neighborhoods?

#artificialintelligence

The problem that Hastie, Tibshirani and Friedman are talking about here is that the number of fixed-size neighborhoods goes up exponentially with the dimension. If you're trying to get some intuition for how isotropic neighborhoods are affected by the curse of dimensionality, think about approximating ball-shaped (isotropic) neighborhoods with cube-shaped neighborhoods. Suppose we have an $d$-dimensional unit cube $[0, 1] d$ that we want to divide up into cube-shaped neighborhoods. If I want a neighborhood of side length $\delta 0.1$, in one dimension this requires $10 1 10$ neighborhoods. In two dimensions, this requires $10 2 100$ neighborhoods.


Tensorflow Image: Augmentation on GPU – Towards Data Science

#artificialintelligence

Here we are going to see different type of Augmentations that can be applied to images. One the most basic Augmentations is to apply the flipping to image which can double the data (based on how you apply). Random flipping: With a 1 in 2 chance your image will be flipped horizontally or vertically. Alternatively you can also use tf.reverse for the same. Image will be rotated k times 90 degrees in counter-clockwise direction.


Fairer and more accurate, but for whom?

arXiv.org Machine Learning

Complex statistical machine learning models are increasingly being used or considered for use in high-stakes decision-making pipelines in domains such as financial services, health care, criminal justice and human services. These models are often investigated as possible improvements over more classical tools such as regression models or human judgement. While the modeling approach may be new, the practice of using some form of risk assessment to inform decisions is not. When determining whether a new model should be adopted, it is therefore essential to be able to compare the proposed model to the existing approach across a range of task-relevant accuracy and fairness metrics. Looking at overall performance metrics, however, may be misleading. Even when two models have comparable overall performance, they may nevertheless disagree in their classifications on a considerable fraction of cases. In this paper we introduce a model comparison framework for automatically identifying subgroups in which the differences between models are most pronounced. Our primary focus is on identifying subgroups where the models differ in terms of fairness-related quantities such as racial or gender disparities. We present experimental results from a recidivism prediction task and a hypothetical lending example.


Improved Error Bounds Based on Worst Likely Assignments

arXiv.org Machine Learning

Error bounds based on worst likely assignments use permutation tests to validate classifiers. Worst likely assignments can produce effective bounds even for data sets with 100 or fewer training examples. This paper introduces a statistic for use in the permutation tests of worst likely assignments that improves error bounds, especially for accurate classifiers, which are typically the classifiers of interest.