AITopics

Learning in the presence of outliers is a fundamental problem in statistics. Until recently, all known efficient unsupervised learning algorithms were very sensitive to outliers in high dimensions. In particular, even for the task of robust mean estimation under natural distributional assumptions, no efficient algorithm was known. Recent work in theoretical computer science gave the first efficient robust estimators for a number of fundamental statistical tasks, including mean and covariance estimation. Since then, there has been a flurry of research activity on algorithmic high-dimensional robust estimation in a range of settings. In this survey article, we introduce the core ideas and algorithmic techniques in the emerging area of algorithmic high-dimensional robust statistics with a focus on robust mean estimation. We also provide an overview of the approaches that have led to computationally efficient robust estimators for a range of broader statistical tasks and discuss new directions and opportunities for future work.

algorithm, estimation, mean estimation, (15 more...)

1911.05911

Country:

North America > United States > New York (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(3 more...)

Genre: Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Explainable Ordinal Factorization Model: Deciphering the Effects of Attributes by Piece-wise Linear Approximation

Guo, Mengzhuo, Xu, Zhongzhi, Zhang, Qingpeng, Liao, Xiuwu, Liu, Jiapeng

Ordinal regression predicts the objects' labels that exhibit a natural ordering, which is important to many managerial problems such as credit scoring and clinical diagnosis. In these problems, the ability to explain how the attributes affect the prediction is critical to users. However, most, if not all, existing ordinal regression models simplify such explanation in the form of constant coefficients for the main and interaction effects of individual attributes. Such explanation cannot characterize the contributions of attributes at different value scales. To address this challenge, we propose a new explainable ordinal regression model, namely, the Explainable Ordinal Factorization Model (XOFM). XOFM uses the piece-wise linear functions to approximate the actual contributions of individual attributes and their interactions. Moreover, XOFM introduces a novel ordinal transformation process to assign each object the probabilities of belonging to multiple relevant classes, instead of fixing boundaries to differentiate classes. XOFM is based on the Factorization Machines to handle the potential sparsity problem as a result of discretizing the attribute scales. Comprehensive experiments with benchmark datasets and baseline models demonstrate that the proposed XOFM exhibits superior explainability and leads to state-of-the-art prediction accuracy.

interaction, ordinal regression problem, xofm, (11 more...)

1911.05909

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > Massachusetts > Plymouth County > Norwell (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.94)
Banking & Finance > Credit (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.87)

There is Limited Correlation between Coverage and Robustness for Deep Neural Networks

Dong, Yizhen, Zhang, Peixin, Wang, Jingyi, Liu, Shuang, Sun, Jun, Hao, Jianye, Wang, Xinyu, Wang, Li, Dong, Jin Song, Ting, Dai

Deep neural networks (DNN) are increasingly applied in safety-critical systems, e.g., for face recognition, autonomous car control and malware detection. It is also shown that DNNs are subject to attacks such as adversarial perturbation and thus must be properly tested. Many coverage criteria for DNN since have been proposed, inspired by the success of code coverage criteria for software programs. The expectation is that if a DNN is a well tested (and retrained) according to such coverage criteria, it is more likely to be robust. In this work, we conduct an empirical study to evaluate the relationship between coverage, robustness and attack/defense metrics for DNN. Our study is the largest to date and systematically done based on 100 DNN models and 25 metrics. One of our findings is that there is limited correlation between coverage and robustness, i.e., improving coverage does not help improve the robustness. Our dataset and implementation have been made available to serve as a benchmark for future studies on testing DNN.

correlation, coverage criteria, robustness, (13 more...)

1911.05904

Country:

North America > United States > District of Columbia > Washington (0.05)
Asia > China > Tianjin Province > Tianjin (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.87)
Transportation > Ground > Road (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision

Jansen, Aren, Ellis, Daniel P. W., Hershey, Shawn, Moore, R. Channing, Plakal, Manoj, Popat, Ashok C., Saurous, Rif A.

Humans do not acquire perceptual abilities in the way we train machines. While machine learning algorithms typically operate on large collections of randomly-chosen, explicitly-labeled examples, human acquisition relies more heavily on multimodal unsupervised learning (as infants) and active learning (as children). With this motivation, we present a learning framework for sound representation and recognition that combines (i) a self-supervised objective based on a general notion of unimodal and cross-modal coincidence, (ii) a clustering objective that reflects our need to impose categorical structure on our experiences, and (iii) a cluster-based active learning procedure that solicits targeted weak supervision to consolidate categories into relevant semantic classes. By training a combined sound embedding/clustering/classification network according to these criteria, we achieve a new state-of-the-art unsupervised audio representation and demonstrate up to a 20-fold reduction in the number of labels required to reach a desired classification performance.

learning, procedure, representation, (13 more...)

1911.05894

Country:

North America > United States > California > Santa Clara County > Mountain View (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)

Deng, Zeyu, Kammoun, Abla, Thrampoulidis, Christos

A Model of Double Descent for High-dimensional Binary Linear Classification

We consider a model for logistic regression where only a subset of features of size $p$ is used for training a linear classifier over $n$ training samples. The classifier is obtained by running gradient-descent (GD) on the logistic-loss. For this model, we investigate the dependence of the generalization error on the overparameterization ratio $\kappa=p/n$. First, building on known deterministic results on convergence properties of the GD, we uncover a phase-transition phenomenon for the case of Gaussian regressors: the generalization error of GD is the same as that of the maximum-likelihood (ML) solution when $\kappa<\kappa_\star$, and that of the max-margin (SVM) solution when $\kappa>\kappa_\star$. Next, using the convex Gaussian min-max theorem (CGMT), we sharply characterize the performance of both the ML and SVM solutions. Combining these results, we obtain curves that explicitly characterize the generalization error of GD for varying values of $\kappa$. The numerical results validate the theoretical predictions and unveil double-descent phenomena that complement similar recent observations in linear regression settings.

arxiv preprint arxiv, proposition 3, regression, (15 more...)

1911.05822

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Saudi Arabia (0.04)

Genre:

Research Report > New Finding (0.49)
Research Report > Experimental Study (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Misra, Dipendra, Henaff, Mikael, Krishnamurthy, Akshay, Langford, John

Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning

We present an algorithm, HOMER, for exploration and reinforcement learning in rich observation environments that are summarizable by an unknown latent state space. The algorithm interleaves representation learning to identify a new notion of kinematic state abstraction with strategic exploration to reach new states using the learned abstraction. The algorithm provably explores the environment with sample complexity scaling polynomially in the number of latent states and the time horizon, and, crucially, with no dependence on the size of the observation space, which could be infinitely large. This exploration guarantee further enables sample-efficient global policy optimization for any reward function. On the computational side, we show that the algorithm can be implemented efficiently whenever certain supervised learning problems are tractable. Empirically, we evaluate HOMER on a challenging exploration problem, where we show that the algorithm is exponentially more sample efficient than standard reinforcement learning baselines.

abstraction, policy cover, probability, (16 more...)

1911.05815

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > Illinois (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Vaz, Yule, de Mello, Rodrigo Fernandes, Grossi, Carlos Henrique

Coarse-Refinement Dilemma: On Generalization Bounds for Data Clustering

This paper is organized as follows: Section 2 briefly introduces some studies related to the formalization of theoretical frameworks in the context of the Data Clustering (DC) problem; Section 3 introduces a general formulation for the DC and HC problems; Section 4 discusses the Coarse-Refinement Dilemma considering the homology group H 0; Section 5 shows that homology groups of degree greater than zero are affected by overrefined and over-coarsed topologies; Section 6 compares our proposed generalization bounds to Carlsson and M emoli [12]'s consistency; finally, conclusions and future directions are provided in Section 8. 2. Related work Data Clustering (DC) faces many challenges in defining and guaranteeing generalization from datasets, as it does not rely on labels and, consequently, it cannot take advantage of computing any evident error measurement such as risk [7]. While studying this issue, Kleinberg [8] considered that a clustering model is an application of a mapping f on top of a distance function d: I I R, given I contains indices of data points in some fixed-size set S, disregarding its ambient space though [25]. From this initial setup, Kleinberg [8] defined three properties to be respected in order to assess clustering algorithms and models: - Scale-invariance: Given a distance and a clustering function, d and f, and a scalar α, the following must hold f (d) f (αd). Thus, the similarity representation over S must be consistent with the units of measurement; - Consistency: Let Γ be a partition of S and d,d null two distance functions. Function d null is referred to as a Γ transformation of d if: (i) for all i,j S belonging to the same cluster, d null (i,j) d( i,j); and (ii) for all i,j S belonging to different clusters, d null (i,j) d( i,j). Consistency holds if f (d null) f ( d) whenever d null is a Σ transformation of d.

dataset, homology class, topological space, (16 more...)

1911.05806

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland (0.04)
South America > Brazil (0.04)
(5 more...)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Manino, Edoardo, Tran-Thanh, Long, Jennings, Nicholas R.

Streaming Bayesian Inference for Crowdsourced Classification

A key challenge in crowdsourcing is inferring the ground truth from noisy and unreliable data. To do so, existing approaches rely on collecting redundant information from the crowd, and aggregating it with some probabilistic method. However, oftentimes such methods are computationally inefficient, are restricted to some specific settings, or lack theoretical guarantees. In this paper, we revisit the problem of binary classification from crowdsourced data. Specifically we propose Streaming Bayesian Inference for Crowdsourcing (SBIC), a new algorithm that does not suffer from any of these limitations. First, SBIC has low complexity and can be used in a real-time online setting. Second, SBIC has the same accuracy as the best state-of-the-art algorithms in all settings. Third, SBIC has provable asymptotic guarantees both in the online and offline settings.

algorithm, proceedings, sbic, (16 more...)

1911.05712

Country:

Asia > Middle East > Jordan (0.04)
North America > Canada (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Diddigi, Raghuram Bharadwaj, Kamanchi, Chandramouli, Bhatnagar, Shalabh

A Convergent Off-Policy Temporal Difference Algorithm

Learning the value function of a given policy (target policy) from the data samples obtained from a different policy (behavior policy) is an important problem in Reinforcement Learning (RL). This problem is studied under the setting of off-policy prediction. Temporal Difference (TD) learning algorithms are a popular class of algorithms for solving the prediction problem. TD algorithms with linear function approximation are shown to be convergent when the samples are generated from the target policy (known as on-policy prediction). However, it has been well established in the literature that off-policy TD algorithms under linear function approximation diverge. In this work, we propose a convergent on-line off-policy TD algorithm under linear function approximation. The main idea is to penalize the updates of the algorithm in a way as to ensure convergence of the iterates. We provide a convergence analysis of our algorithm. Through numerical evaluations, we further demonstrate the effectiveness of our algorithm.

algorithm, approximation, value function, (14 more...)

1911.05697

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Rauber, Jonas, Fox, Emily B., Gatys, Leon A.

Modeling patterns of smartphone usage and their relationship to cognitive health

The ubiquity of smartphone usage in many people's lives make it a rich source of information about a person's mental and cognitive state. In this work we analyze 12 weeks of phone usage data from 113 older adults, 31 with diagnosed cognitive impairment and 82 without. We develop structured models of users' smartphone interactions to reveal differences in phone usage patterns between people with and without cognitive impairment. In particular, we focus on inferring specific types of phone usage sessions that are predictive of cognitive impairment. Our model achieves an AUROC of 0.79 when discriminating between healthy and symptomatic subjects, and its interpretability enables novel insights into which aspects of phone usage strongly relate with cognitive health in our dataset.

cognitive health, proceedings, session type, (15 more...)

1911.05683

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)