AITopics

1605.02869

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

@machinelearnbotSep-23-2016, 11:45:11 GMT

Need for DYNAMICAL Machine Learning: Bayesian exact recursive estimation

In my recent blog, Marrying Kalman Filtering & Machine Learning, we saw the merger of Bayesian exact recursive estimation (algorithm for which is Kalman Filter/Smoother in the linear, Gaussian case) and Machine Learning. We developed a solution called Kernel Projection Kalman Filter for business applications that require static or dynamical, dynamical or time-varying dynamical, linear or non-linear Machine Learning, i.e., pretty much all applications - therefore, Kernel Projection Kalman Filter is a "universal" solution . . . But who needs anything more than STATIC Machine Learning (ML)? Indeed, university courses in ML largely teach static ML. Given a set of inputs and outputs, find a static map between the two during supervised "Training" and use this static map for business purposes during "Operation" (which is called "Testing" during pre-operation evaluation).

artificial intelligence, learning, machine learning, (10 more...)

@machinelearnbot

Industry: Health & Medicine > Therapeutic Area (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)

Ma, Kai-Chieh, Liu, Lantao, Sukhatme, Gaurav S.

Informative Planning and Online Learning with Sparse Gaussian Processes

arXiv.org Machine LearningSep-23-2016

A big challenge in environmental monitoring is the spatiotemporal variation of the phenomena to be observed. To enable persistent sensing and estimation in such a setting, it is beneficial to have a time-varying underlying environmental model. Here we present a planning and learning method that enables an autonomous marine vehicle to perform persistent ocean monitoring tasks by learning and refining an environmental model. To alleviate the computational bottleneck caused by large-scale data accumulated, we propose a framework that iterates between a planning component aimed at collecting the most information-rich data, and a sparse Gaussian Process learning component where the environmental model and hyperparameters are learned online by taking advantage of only a subset of data that provides the greatest contribution. Our simulations with ground-truth ocean data shows that the proposed method is both accurate and efficient.

environmental monitoring, hyperparameter, information, (13 more...)

1609.0756

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.64)

Industry:

Education > Educational Setting > Online (0.42)
Transportation (0.36)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.49)
(4 more...)

Arvanitidis, Georgios, Hansen, Lars Kai, Hauberg, Søren

A Locally Adaptive Normal Distribution

arXiv.org Machine LearningSep-23-2016

The multivariate normal density is a monotonic function of the distance to the mean, and its ellipsoidal shape is due to the underlying Euclidean metric. We suggest to replace this metric with a locally adaptive, smoothly changing (Riemannian) metric that favors regions of high local density. The resulting locally adaptive normal distribution (LAND) is a generalization of the normal distribution to the "manifold" setting, where data is assumed to lie near a potentially low-dimensional manifold embedded in $\mathbb{R}^D$. The LAND is parametric, depending only on a mean and a covariance, and is the maximum entropy distribution under the given metric. The underlying metric is, however, non-parametric. We develop a maximum likelihood algorithm to infer the distribution parameters that relies on a combination of gradient descent and Monte Carlo integration. We further extend the LAND to mixture models, and provide the corresponding EM algorithm. We demonstrate the efficiency of the LAND to fit non-trivial probability distributions over both synthetic data, and EEG measurements of human sleep.

artificial intelligence, machine learning, manifold, (19 more...)

1606.02518

Country:

North America > United States (0.67)
Europe (0.46)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)

Hughes, Michael C., Sudderth, Erik B.

Fast Learning of Clusters and Topics via Sparse Posteriors

arXiv.org Machine LearningSep-23-2016

Mixture models and topic models generate each observation from a single cluster, but standard variational posteriors for each observation assign positive probability to all possible clusters. This requires dense storage and runtime costs that scale with the total number of clusters, even though typically only a few clusters have significant posterior mass for any data point. We propose a constrained family of sparse variational distributions that allow at most $L$ non-zero entries, where the tunable threshold $L$ trades off speed for accuracy. Previous sparse approximations have used hard assignments ($L=1$), but we find that moderate values of $L>1$ provide superior performance. Our approach easily integrates with stochastic or incremental optimization algorithms to scale to millions of examples. Experiments training mixture models of image patches and topic models for news articles show that our approach produces better-quality models in far less time than baseline methods.

artificial intelligence, machine learning, natural language, (21 more...)

1609.07521

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

#artificialintelligenceSep-22-2016, 19:56:28 GMT

The Three Faces of Bayes

Last summer, I was at a conference having lunch with Hal Daume III when we got to talking about how "Bayesian" can be a funny and ambiguous term. It seems like the definition should be straightforward: "following the work of English mathematician Rev. Thomas Bayes," perhaps, or even "uses Bayes' theorem." But many methods bearing the reverend's name or using his theorem aren't even considered "Bayesian" by his most religious followers. Why is it that Bayesian networks, for example, aren't considered… y'know… Bayesian? As I've read more outside the fields of machine learning and natural language processing -- from psychometrics and environmental biology to hackers who dabble in data science -- I've noticed three broad uses of the term "Bayesian."

artificial intelligence, bayesian, machine learning, (16 more...)

#artificialintelligence

Country: Asia > Middle East > Jordan (0.05)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Lee, Young, Lim, Kar Wai, Ong, Cheng Soon

Hawkes Processes with Stochastic Excitations

We propose an extension to Hawkes processes by treating the levels of self-excitation as a stochastic differential equation. Our new point process allows better approximation in application domains where events and intensities accelerate each other with correlated levels of contagion. We generalize a recent algorithm for simulating draws from Hawkes processes whose levels of excitation are stochastic processes, and propose a hybrid Markov chain Monte Carlo approach for model fitting. Our sampling procedure scales linearly with the number of required events and does not require stationarity of the point process. A modular inference procedure consisting of a combination between Gibbs and Metropolis Hastings steps is put forward. We recover expectation maximization as a special case. Our general approach is illustrated for contagion following geometric Brownian motion and exponential Langevin dynamics.

bayesian inference, hawke process, upstream oil & gas, (19 more...)

1609.06831

Country:

Europe (0.28)
North America > United States > New York (0.14)
Oceania > Australia (0.14)
(2 more...)

Genre: Research Report (0.40)

Industry:

Government > Regional Government (0.46)
Energy > Oil & Gas > Upstream (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Hennig, Philipp, Garnett, Roman

Exact Sampling from Determinantal Point Processes

Determinantal point processes (DPPs) are an important concept in random matrix theory and combinatorics. They have also recently attracted interest in the study of numerical methods for machine learning, as they offer an elegant "missing link" between independent Monte Carlo sampling and deterministic evaluation on regular grids, applicable to a general set of spaces. This is helpful whenever an algorithm *explores* to reduce uncertainty, such as in active learning, Bayesian optimization, reinforcement learning, and marginalization in graphical models. To draw samples from a DPP in practice, existing literature focuses on approximate schemes of low cost, or comparably inefficient exact algorithms like rejection sampling. We point out that, for many settings of relevance to machine learning, it is also possible to draw *exact* samples from DPPs on continuous domains. We start from an intuitive example on the real line, which is then generalized to multivariate real vector spaces. We also compare to previously studied approximations, showing that exact sampling, despite higher cost, can be preferable where precision is needed.

artificial intelligence, dpp, machine learning, (15 more...)

1609.0684

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Magrini, Alessandro, Luciani, Davide, Stefanini, Federico Mattia

A probabilistic network for the diagnosis of acute cardiopulmonary diseases

We describe our experience in the development of a probabilistic network for the diagnosis of acute cardiopulmonary diseases. A panel of expert physicians collaborated to specify the qualitative part, that is a directed acyclic graph defining a factorization of the joint probability distribution of domain variables. The quantitative part, that is the set of all conditional probability distributions defined by each factor, was estimated in the Bayesian paradigm: we applied a special formal representation, characterized by a low number of parameters and a parameterization intelligible for physicians, elicited the joint prior distribution of parameters from medical experts, and updated it by conditioning on a dataset of hospital patient records using Markov Chain Monte Carlo simulation. Refinement was cyclically performed until the probabilistic network provided satisfactory Concordance Index values for a selection of acute diseases and reasonable inference on six fictitious patient cases. The probabilistic network can be employed to perform medical diagnosis on a total of 63 diseases (38 acute and 25 chronic) on the basis of up to 167 patient findings.

artificial intelligence, machine learning, semiotic, (15 more...)

1609.06864

Country:

Europe > United Kingdom > England (0.46)
Europe > Italy (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Advani, Madhu, Ganguli, Surya

An equivalence between high dimensional Bayes optimal inference and M-estimation

When recovering an unknown signal from noisy measurements, the computational difficulty of performing optimal Bayesian MMSE (minimum mean squared error) inference often necessitates the use of maximum a posteriori (MAP) inference, a special case of regularized M-estimation, as a surrogate. However, MAP is suboptimal in high dimensions, when the number of unknown signal components is similar to the number of measurements. In this work we demonstrate, when the signal distribution and the likelihood function associated with the noise are both log-concave, that optimal MMSE performance is asymptotically achievable via another M-estimation procedure. This procedure involves minimizing convex loss and regularizer functions that are nonlinearly smoothed versions of the widely applied MAP optimization problem. Our findings provide a new heuristic derivation and interpretation for recent optimal M-estimators found in the setting of linear measurements and additive noise, and further extend these results to nonlinear measurements with non-additive noise. We numerically demonstrate superior performance of our optimal M-estimators relative to MAP. Overall, at the heart of our work is the revelation of a remarkable equivalence between two seemingly very different computational problems: namely that of high dimensional Bayesian integration underlying MMSE inference, and high dimensional convex optimization underlying M-estimation. In essence we show that the former difficult integral may be computed by solving the latter, simpler optimization problem.

artificial intelligence, inference, machine learning, (18 more...)

1609.0706

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)