AITopics | Directed Networks

Collaborating Authors

Directed Networks

News Overviews Instructional Materials AI-Alerts Classics

Machine Learning Testing: Survey, Landscapes and Horizons

Zhang, Jie M., Harman, Mark, Ma, Lei, Liu, Yang

arXiv.org Artificial IntelligenceJun-19-2019

This paper provides a comprehensive survey of Machine Learning Testing (ML testing) research. It covers 128 papers on testing properties (e.g., correctness, robustness, and fairness), testing components (e.g., the data, learning program, and framework), testing workflow (e.g., test generation and test evaluation), and application scenarios (e.g., autonomous driving, machine translation). The paper also analyses trends concerning datasets, research trends, and research focus, concluding with research challenges and promising research directions in ML testing.

machine learning, ml testing, natural language, (18 more...)

arXiv.org Artificial Intelligence

1906.10742

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.04)
Asia > China > Hong Kong (0.04)
(12 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.87)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Banking & Finance (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(4 more...)

Add feedback

Bayesian inverse regression for supervised dimension reduction with small datasets

Cai, Xin, Lin, Guang, Li, Jinglai

arXiv.org Machine LearningJun-19-2019

We consider supervised dimension reduction problems, namely to identify a low dimensional projection of the predictors $\-x$ which can retain the statistical relationship between $\-x$ and the response variable $y$. We follow the idea of the sliced inverse regression (SIR) class of methods, which is to use the statistical information of the conditional distribution $\pi(\-x|y)$ to identify the dimension reduction (DR) space and in particular we focus on the task of computing this conditional distribution. We propose a Bayesian framework to compute the conditional distribution where the likelihood function is obtained using the Gaussian process regression model. The conditional distribution $\pi(\-x|y)$ can then be obtained directly by assigning weights to the original data points. We then can perform DR by considering certain moment functions (e.g. the first moment) of the samples of the posterior distribution. With numerical examples, we demonstrate that the proposed method is especially effective for small data problems.

artificial intelligence, machine learning, regression, (18 more...)

arXiv.org Machine Learning

1906.08018

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

Learning Directed Graphical Models from Gaussian Data

Fitch, Katherine

arXiv.org Machine LearningJun-19-2019

In this paper, we introduce two new directed graphical models from Gaussian data: the Gaussian graphical interaction model (GGIM) and the Gaussian graphical conditional expectation model (GGCEM). The development of these models comes from considering stationary Gaussian processes on graphs, and leveraging the equations between the resulting steady-state covariance matrix and the Laplacian matrix representing the interaction graph. Through the presentation of conceptually straightforward theory, we develop the new models and provide interpretations of the edges in each graphical model in terms of statistical measures. We show that when restricted to undirected graphs, the Laplacian matrix representing a GGIM is equivalent to the standard inverse covariance matrix that encodes conditional dependence relationships. We demonstrate that the problem of learning sparse GGIMs and GGCEMs for a given observation set can be framed as a LASSO problem. By comparison with the problem of inverse covariance estimation, we prove a bound on the difference between the covariance matrix corresponding to a sparse GGIM and the covariance matrix corresponding to the $l_1$-norm penalized maximum log-likelihood estimate. In all, the new models present a novel perspective on directed relationships between variables and significantly expand on the state of the art in Gaussian graphical modeling.

artificial intelligence, machine learning, matrix, (16 more...)

arXiv.org Machine Learning

1906.0805

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.96)

Add feedback

Adversarial Self-Paced Learning for Mixture Models of Hawkes Processes

Luo, Dixin, Xu, Hongteng, Carin, Lawrence

arXiv.org Machine LearningJun-19-2019

We propose a novel adversarial learning strategy for mixture models of Hawkes processes, leveraging data augmentation techniques of Hawkes process in the framework of self-paced learning. Instead of learning a mixture model directly from a set of event sequences drawn from different Hawkes processes, the proposed method learns the target model iteratively, which generates "easy" sequences and uses them in an adversarial and self-paced manner. In each iteration, we first generate a set of augmented sequences from original observed sequences. Based on the fact that an easy sample of the target model can be an adversarial sample of a misspecified model, we apply a maximum likelihood estimation with an adversarial self-paced mechanism. In this manner the target model is updated, and the augmented sequences that obey it are employed for the next learning iteration. Experimental results show that the proposed method outperforms traditional methods consistently.

adversarial self-paced learning, bayesian inference, machine learning, (3 more...)

arXiv.org Machine Learning

1906.08397

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.53)

Add feedback

The Broad Optimality of Profile Maximum Likelihood

Hao, Yi, Orlitsky, Alon

arXiv.org Machine LearningJun-19-2019

We study three fundamental statistical-learning problems: distribution estimation, property estimation, and property testing. We establish the profile maximum likelihood (PML) estimator as the first unified sample-optimal approach to a wide range of learning tasks. In particular, for every alphabet size $k$ and desired accuracy $\varepsilon$: $\textbf{Distribution estimation}$ Under $\ell_1$ distance, PML yields optimal $\Theta(k/(\varepsilon^2\log k))$ sample complexity for sorted-distribution estimation, and a PML-based estimator empirically outperforms the Good-Turing estimator on the actual distribution; $\textbf{Additive property estimation}$ For a broad class of additive properties, the PML plug-in estimator uses just four times the sample size required by the best estimator to achieve roughly twice its error, with exponentially higher confidence; $\boldsymbol{\alpha}\textbf{-R\'enyi entropy estimation}$ For integer $\alpha>1$, the PML plug-in estimator has optimal $k^{1-1/\alpha}$ sample complexity; for non-integer $\alpha>3/4$, the PML plug-in estimator has sample complexity lower than the state of the art; $\textbf{Identity testing}$ In testing whether an unknown distribution is equal to or at least $\varepsilon$ far from a given distribution in $\ell_1$ distance, a PML-based tester achieves the optimal sample complexity up to logarithmic factors of $k$. With minor modifications, most of these results also hold for a near-linear-time computable variant of PML.

estimation, estimator, probability, (15 more...)

arXiv.org Machine Learning

1906.03794

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)

Add feedback

Minimum Stein Discrepancy Estimators

Barp, Alessandro, Briol, Francois-Xavier, Duncan, Andrew B., Girolami, Mark, Mackey, Lester

arXiv.org Machine LearningJun-19-2019

When maximum likelihood estimation is infeasible, one often turns to score matching, contrastive divergence, or minimum probability flow learning to obtain tractable parameter estimates. We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with complementary strengths. We establish the consistency, asymptotic normality, and robustness of DKSD and DSM estimators, derive stochastic Riemannian gradient descent algorithms for their efficient optimization, and demonstrate their advantages over score matching in models with non-smooth densities or heavy tailed distributions.

bayesian inference, machine learning, minimum stein discrepancy estimator, (1 more...)

arXiv.org Machine Learning

1906.08283

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.53)

Add feedback

Variational Gaussian Processes with Signature Covariances

Toth, Csaba, Oberhauser, Harald

arXiv.org Machine LearningJun-19-2019

We introduce a Bayesian approach to learn from stream-valued data by using Gaussian processes with the recently introduced signature kernel as covariance function. To cope with the computational complexity in time and memory that arises with long streams that evolve in large state spaces, we develop a variational Bayes approach with sparse inducing tensors. We provide an implementation based on GPFlow and benchmark this variational Gaussian process model on supervised classification tasks for time series and text (a stream of words).

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

1906.08215

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning

Kirsch, Andreas, van Amersfoort, Joost, Gal, Yarin

arXiv.org Machine LearningJun-19-2019

We develop BatchBALD, a tractable approximation to the mutual information between a batch of points and model parameters, which we use as an acquisition function to select multiple informative points jointly for the task of deep Bayesian active learning. BatchBALD is a greedy linear-time $1 - \frac{1}{e}$-approximate algorithm amenable to dynamic programming and efficient caching. We compare BatchBALD to the commonly used approach for batch data acquisition and find that the current approach acquires similar and redundant points, sometimes performing worse than randomly acquiring data. We finish by showing that, using BatchBALD to consider dependencies within an acquisition batch, we achieve new state of the art performance on standard benchmarks, providing substantial data efficiency improvements in batch acquisition.

artificial intelligence, batchbald, machine learning, (16 more...)

arXiv.org Machine Learning

1906.08158

Genre: Research Report (0.82)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.71)

Add feedback

Efficient Algorithms for Set-Valued Prediction in Multi-Class Classification

Mortier, Thomas, Wydmuch, Marek, Hüllermeier, Eyke, Dembczyński, Krzysztof, Waegeman, Willem

arXiv.org Machine LearningJun-19-2019

In cases of uncertainty, a multi-class classifier preferably returns a set of candidate classes instead of predicting a single class label with little guarantee. More precisely, the classifier should strive for an optimal balance between the correctness (the true class is among the candidates) and the precision (the candidates are not too many) of its prediction. We formalize this problem within a general decision-theoretic framework that unifies most of the existing work in this area. In this framework, uncertainty is quantified in terms of conditional class probabilities, and the quality of a predicted set is measured in terms of a utility function. We then address the problem of finding the Bayes-optimal prediction, i.e., the subset of class labels with highest expected utility. For this problem, which is computationally challenging as there are exponentially (in the number of classes) many predictions to choose from, we propose efficient algorithms that can be applied to a broad family of utility scores. Two of these algorithms make use of structural information in the form of a class hierarchy, which is often available in prediction problems with many classes. Our theoretical results are complemented by experimental studies, in which we analyze the proposed algorithms in terms of predictive accuracy and runtime efficiency.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Machine Learning

1906.08129

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Add feedback

On The Radon--Nikodym Spectral Approach With Optimal Clustering

Malyshkin, Vladislav Gennadievich

arXiv.org Machine LearningJun-19-2019

Problems of interpolation, classification, and clustering are considered. In the tenets of Radon--Nikodym approach $\langle f(\mathbf{x})\psi^2 \rangle / \langle\psi^2\rangle$, where the $\psi(\mathbf{x})$ is a linear function on input attributes, all the answers are obtained from a generalized eigenproblem $|f|\psi^{[i]}\rangle = \lambda^{[i]} |\psi^{[i]}\rangle$. The solution to the interpolation problem is a regular Radon-Nikodym derivative. The solution to the classification problem requires prior and posterior probabilities that are obtained using the Lebesgue quadrature[1] technique. Whereas in a Bayesian approach new observations change only outcome probabilities, in the Radon-Nikodym approach not only outcome probabilities but also the probability space $|\psi^{[i]}\rangle$ change with new observations. This is a remarkable feature of the approach: both the probabilities and the probability space are constructed from the data. The Lebesgue quadrature technique can be also applied to the optimal clustering problem. The problem is solved by constructing a Gaussian quadrature on the Lebesgue measure. A distinguishing feature of the Radon-Nikodym approach is the knowledge of the invariant group: all the answers are invariant relatively any non-degenerated linear transform of input vector $\mathbf{x}$ components. A software product implementing the algorithms of interpolation, classification, and optimal clustering is available from the authors.

artificial intelligence, machine learning, quadrature, (18 more...)

arXiv.org Machine Learning

1906.0046

Country: North America > United States > Wisconsin (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback