AITopics

1810.05546

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

arXiv.org Machine LearningOct-22-2018

Implicit Maximum Likelihood Estimation

Li, Ke, Malik, Jitendra

Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.

artificial intelligence, bayesian inference, machine learning, (16 more...)

1809.09087

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

arXiv.org Machine LearningOct-20-2018

Hybrid-MST: A Hybrid Active Sampling Strategy for Pairwise Preference Aggregation

Li, Jing, Mantiuk, Rafal K., Wang, Junle, Ling, Suiyi, Callet, Patrick Le

In this paper we present a hybrid active sampling strategy for pairwise preference aggregation, which aims at recovering the underlying rating of the test candidates from sparse and noisy pairwise labelling. Our method employs Bayesian optimization framework and Bradley-Terry model to construct the utility function, then to obtain the Expected Information Gain (EIG) of each pair. For computational efficiency, Gaussian-Hermite quadrature is used for estimation of EIG. In this work, a hybrid active sampling strategy is proposed, either using Global Maximum (GM) EIG sampling or Minimum Spanning Tree (MST) sampling in each trial, which is determined by the test budget. The proposed method has been validated on both simulated and real-world datasets, where it shows higher preference aggregation ability than the state-of-the-art methods.

artificial intelligence, bayesian inference, machine learning, (19 more...)

1810.08851

Country: Europe (0.46)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.81)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Duchi, John, Namkoong, Hongseok

Learning Models with Uniform Performance via Distributionally Robust Optimization

In many applications of statistics and machine learning, we wish to learn models that achieve uniformly good performance over almost all input values. This is important for safety-and fairnesscritical systems such as medical diagnosis, autonomous vehicles, criminal justice and credit evaluations, where poor performance on the tails of the inputs leads to high-cost system failures. Methods that optimize average performance, however, often produce models that suffer low performance on the "hard" instances of the population. For example, standard regressors obtained from maximum likelihood estimation can lose their predictive power on certain regions of covariates [57], so that high average performance comes at the expense of low performance on minority subpopulations. In this work, we propose and study a procedure that explicitly optimizes performance on tail inputs that suffer high loss. Modern datasets incorporate heterogeneous (but latent) subpopulations, and a natural goal is to perform well across all of these [57, 65, 21]. While many statistical models show strong average performance, their performance often deteriorates on minority groups underrepresented in the dataset. For example, speech recognition systems are inaccurate for people with minority accents [4]. In numerous other applications--such as facial recognition, automatic video captioning, language identification, academic recommender systems--performance varies significantly over different demographic groupings, such as race, gender, or age [38, 42, 18, 68, 76].

artificial intelligence, bayesian inference, machine learning, (19 more...)

1810.0875

Country:

Europe (0.45)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Alipourfard, Borzou, Gao, Jean X.

Renormalized Normalized Maximum Likelihood and Three-Part Code Criteria For Learning Gaussian Networks

Score based learning (SBL) is a promising approach for learning Bayesian networks in the discrete domain. However, when employing SBL in the continuous domain, one is either forced to move the problem to the discrete domain or use metrics such as BIC/AIC, and these approaches are often lacking. Discretization can have an undesired impact on the accuracy of the results, and BIC/AIC can fall short of achieving the desired accuracy. In this paper, we introduce two new scoring metrics for scoring Bayesian networks in the continuous domain: the three-part minimum description length and the renormalized normalized maximum likelihood metric. We rely on the minimum description length principle in formulating these metrics. The metrics proposed are free of hyperparameters, decomposable, and are asymptotically consistent. We evaluate our solution by studying the convergence rate of the learned graph to the generating network and, also, the structural hamming distance of the learned graph to the generating network. Our evaluations show that the proposed metrics outperform their competitors, the BIC/AIC metrics. Furthermore, using the proposed RNML metric, SBL will have the fastest rate of convergence with the smallest structural hamming distance to the generating network.

artificial intelligence, bayesian network, machine learning, (17 more...)

1810.08749

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Freund, Robert M., Grigas, Paul, Mazumder, Rahul

Condition Number Analysis of Logistic Regression, and its Implications for Standard First-Order Solution Methods

Logistic regression is one of the most popular methods in binary classification, wherein estimation of model parameters is carried out by solving the maximum likelihood (ML) optimization problem, and the ML estimator is defined to be the optimal solution of this problem. It is well known that the ML estimator exists when the data is non-separable, but fails to exist when the data is separable. First-order methods are the algorithms of choice for solving large-scale instances of the logistic regression problem. In this paper, we introduce a pair of condition numbers that measure the degree of non-separability or separability of a given dataset in the setting of binary classification, and we study how these condition numbers relate to and inform the properties and the convergence guarantees of first-order methods. When the training data is non-separable, we show that the degree of non-separability naturally enters the analysis and informs the properties and convergence guarantees of two standard first-order methods: steepest descent (for any given norm) and stochastic gradient descent. Expanding on the work of Bach, we also show how the degree of non-separability enters into the analysis of linear convergence of steepest descent (without needing strong convexity), as well as the adaptive convergence of stochastic gradient descent. When the training data is separable, first-order methods rather curiously have good empirical success, which is not well understood in theory. In the case of separable data, we demonstrate how the degree of separability enters into the analysis of $\ell_2$ steepest descent and stochastic gradient descent for delivering approximate-maximum-margin solutions with associated computational guarantees as well. This suggests that first-order methods can lead to statistically meaningful solutions in the separable case, even though the ML solution does not exist.

artificial intelligence, descent, machine learning, (17 more...)

1810.08727

Country:

North America > United States > Massachusetts (0.28)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Zhang, Quan, Zhou, Mingyuan

Nonparametric Bayesian Lomax delegate racing for survival analysis with competing risks

We propose Lomax delegate racing (LDR) to explicitly model the mechanism of survival under competing risks and to interpret how the covariates accelerate or decelerate the time to event. LDR explains non-monotonic covariate effects by racing a potentially infinite number of sub-risks, and consequently relaxes the ubiquitous proportional-hazards assumption which may be too restrictive. Moreover, LDR is naturally able to model not only censoring, but also missing event times or event types. For inference, we develop a Gibbs sampler under data augmentation for moderately sized data, along with a stochastic gradient descent maximum a posteriori inference algorithm for big data applications. Illustrative experiments are provided on both synthetic and real datasets, and comparison with various benchmark algorithms for survival analysis with competing risks demonstrates distinguished performance of LDR.

artificial intelligence, c-index, machine learning, (18 more...)

1810.08564

Country: North America > United States > Texas (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Duan, Leo L, Dunson, David B

Bayesian Distance Clustering

Model-based clustering is widely-used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance clustering methods, which rely on modeling the likelihood of the pairwise distances in place of the original data. Although some information in the data is discarded, we gain substantial robustness to modeling assumptions. The proposed approach represents an appealing middle ground between distance- and model-based clustering, drawing advantages from each of these canonical approaches. We illustrate dramatic gains in the ability to infer clusters that are not well represented by the usual choices of kernel. A simulation study is included to assess performance relative to competitors, and we apply the approach to clustering of brain genome expression data. Keywords: Distance-based clustering; Mixture model; Model-based clustering; Model misspecification; Pairwise distance matrix; Partial likelihood; Robustness.

artificial intelligence, likelihood, machine learning, (16 more...)

1810.08537

Country: North America (0.46)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Fanzeres, Leonardo A., Vivacqua, Adriana S., Biscainho, Luiz W. P.

Mobile Sound Recognition for the Deaf and Hard of Hearing

arXiv.org Artificial IntelligenceOct-19-2018

Human perception of surrounding events is strongly dependent on audio cues. Thus, acoustic insulation can seriously impact situational awareness. We present an exploratory study in the domain of assistive computing, eliciting requirements and presenting solutions to problems found in the development of an environmental sound recognition system, which aims to assist deaf and hard of hearing people in the perception of sounds. To take advantage of smartphones computational ubiquity, we propose a system that executes all processing on the device itself, from audio features extraction to recognition and visual presentation of results. Our application also presents the confidence level of the classification to the user. A test of the system conducted with deaf users provided important and inspiring feedback from participants.

artificial intelligence, data mining, machine learning, (21 more...)

arXiv.org Artificial Intelligence

1810.08707

Country:

South America > Brazil (0.68)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Otolaryngology (0.72)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Mobile (1.00)
(4 more...)

Islam, Muhammad Asiful, Srikant, Ramakrishnan, Basu, Sugato

Micro-Browsing Models for Search Snippets

arXiv.org Machine LearningOct-18-2018

Abstract--Click-through rate (CTR) is a key signal of relevance for search engine results, both organic and sponsored. CTR of a result has two core components: (a) the probability of examination of a result by a user, and (b) the perceived relevance of the result given that it has been examined by the user . There has been considerable work on user browsing models, to model and analyze both the examination and the relevance components of CTR. In this paper, we propose a novel formulation: a micro-browsing model for how users read result snippets. The snippet text of a result often plays a critical role in the perceived relevance of the result. We study how particular words within a line of snippet can influence user behavior . We validate this new micro-browsing user model by considering the problem of predicting which snippet will yield higher CTR, and show that classification accuracy is dramatically higher with our micro-browsing user model. The key insight in this paper is that varying relatively few words within a snippet, and even their location within a snippet, can have a significant influence on the clickthrough of a snippet. Web search engines have become an essential tool for navigating the vast amounts of information on the internet.

micro-browsing model, relevance, snippet, (13 more...)

1810.08223

Country: North America > United States > California > Santa Clara County > Mountain View (0.04)

Genre: Research Report (0.85)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)