AITopics

1412.6544

Country:

North America > Canada (0.28)
North America > United States > California > Santa Clara County (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Wiggins, Paul A., LaMont, Colin H.

The development of an information criterion for Change-Point Analysis

Change-point analysis is a flexible and computationally tractable tool for the analysis of times series data from systems that transition between discrete states and whose observables are corrupted by noise. The change-point algorithm is used to identify the time indices (change points) at which the system transitions between these discrete states. We present a unified information-based approach to testing for the existence of change points. This new approach reconciles two previously disparate approaches to Change-Point Analysis (frequentist and information-based) for testing transitions between states. The resulting method is statistically principled, parameter and prior free and widely applicable to a wide range of change-point problems.

artificial intelligence, complexity, machine learning, (16 more...)

1505.05572

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Cohen, Taco S., Welling, Max

Harmonic Exponential Families on Manifolds

In a range of fields including the geosciences, molecular biology, robotics and computer vision, one encounters problems that involve random variables on manifolds. Currently, there is a lack of flexible probabilistic models on manifolds that are fast and easy to train. We define an extremely flexible class of exponential family distributions on manifolds such as the torus, sphere, and rotation groups, and show that for these distributions the gradient of the log-likelihood can be computed efficiently using a non-commutative generalization of the Fast Fourier Transform (FFT). We discuss applications to Bayesian camera motion estimation (where harmonic exponential families serve as conjugate priors), and modelling of the spatial distribution of earthquakes on the surface of the earth. Our experimental results show that harmonic densities yield a significantly higher likelihood than the best competing method, while being orders of magnitude faster to train.

artificial intelligence, data quality, exponential family, (18 more...)

1505.04413

Country: North America > United States > California > Alameda County (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.66)
Information Technology > Data Science > Data Quality > Data Transformation (0.56)

Swaminathan, Adith, Joachims, Thorsten

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. This learning setting is ubiquitous in online systems (e.g., ad placement, web search, recommendation), where an algorithm makes a prediction (e.g., ad ranking) for a given input (e.g., query) and observes bandit feedback (e.g., user clicks on presented ads). We first address the counterfactual nature of the learning problem through propensity scoring. Next, we prove generalization error bounds that account for the variance of the propensity-weighted empirical risk estimator. These constructive bounds give rise to the Counterfactual Risk Minimization (CRM) principle. We show how CRM can be used to derive a new learning method -- called Policy Optimizer for Exponential Models (POEM) -- for learning stochastic linear rules for structured output prediction. We present a decomposition of the POEM objective that enables efficient stochastic gradient optimization. POEM is evaluated on several multi-label classification problems showing substantially improved robustness and generalization performance compared to the state-of-the-art.

artificial intelligence, machine learning, prediction, (15 more...)

1502.02362

Country:

Europe (0.46)
North America > United States > New York (0.14)

Genre: Research Report (1.00)

Industry: Education (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Chang, Kai-Wei, Krishnamurthy, Akshay, Agarwal, Alekh, Daumé, Hal III, Langford, John

Learning to Search Better Than Your Teacher

Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference. This is unsatisfactory in many applications where the reference policy is suboptimal and the goal of learning is to improve upon it. Can learning to search work even when the reference is poor? We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy: a local-optimality guarantee. Consequently, LOLS can improve upon the reference policy, unlike previous algorithms. This enables us to develop structured contextual bandits, a partial information structured prediction setting with many potential applications.

artificial intelligence, natural language, reference policy, (18 more...)

1502.02206

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.94)

Kasparick, Martin, Cavalcante, Renato L. G., Valentin, Stefan, Stanczak, Slawomir, Yukawa, Masahiro

Kernel-Based Adaptive Online Reconstruction of Coverage Maps With Side Information

In this paper, we address the problem of reconstructing coverage maps from path-loss measurements in cellular networks. We propose and evaluate two kernel-based adaptive online algorithms as an alternative to typical offline methods. The proposed algorithms are application-tailored extensions of powerful iterative methods such as the adaptive projected subgradient method and a state-of-the-art adaptive multikernel method. Assuming that the moving trajectories of users are available, it is shown how side information can be incorporated in the algorithms to improve their convergence performance and the quality of the estimation. The complexity is significantly reduced by imposing sparsity-awareness in the sense that the algorithms exploit the compressibility of the measurement data to reduce the amount of data which is saved and processed. Finally, we present extensive simulations based on realistic data to show that our algorithms provide fast, robust estimates of coverage maps in real-world scenarios. Envisioned applications include path-loss prediction along trajectories of mobile users as a building block for anticipatory buffering or traffic offloading.

algorithm, optimization problem, télécommunications, (19 more...)

doi: 10.1109/TVT.2015.2453391

1404.0979

Country: Europe > Germany (0.46)

Genre: Research Report > New Finding (0.93)

Industry: Telecommunications (1.00)

Technology:

Information Technology > Modeling & Simulation (0.68)
Information Technology > Communications > Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Massaro, T. J., Bozdogan, H.

Variable subset selection via GA and information complexity in mixtures of Poisson and negative binomial regression models

Count data, for example the number of observed cases of a disease in a city, often arise in the fields of healthcare analytics and epidemiology. In this paper, we consider performing regression on multivariate data in which our outcome is a count. Specifically, we derive log-likelihood functions for finite mixtures of regression models involving counts that come from a Poisson distribution, as well as a negative binomial distribution when the counts are significantly overdispersed. Within our proposed modeling framework, we carry out optimal component selection using the information criteria scores AIC, BIC, CAIC, and ICOMP. We demonstrate applications of our approach on simulated data, as well as on a real data set of HIV cases in Tennessee counties from the year 2010. Finally, using a genetic algorithm within our framework, we perform variable subset selection to determine the covariates that are most responsible for categorizing Tennessee counties. This leads to some interesting insights into the traits of counties that have high HIV counts.

immunology, internal medicine, selection, (19 more...)

1505.05229

Country: North America > United States > Tennessee (0.57)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology > HIV (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Harandi, Mehrtash, Hartley, Richard, Shen, Chunhua, Lovell, Brian, Sanderson, Conrad

Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds

Sparsity-based representations have recently led to notable results in various visual recognition tasks. In a separate line of research, Riemannian manifolds have been shown useful for dealing with features and models that do not lie in Euclidean spaces. With the aim of building a bridge between the two realms, we address the problem of sparse coding and dictionary learning over the space of linear subspaces, which form Riemannian structures known as Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into the space of symmetric matrices by an isometric mapping. This in turn enables us to extend two sparse coding schemes to Grassmann manifolds. Furthermore, we propose closed-form solutions for learning a Grassmann dictionary, atom by atom. Lastly, to handle non-linearity in data, we extend the proposed Grassmann sparse coding and dictionary learning algorithms through embedding into Hilbert spaces. Experiments on several classification tasks (gender recognition, gesture classification, scene analysis, face recognition, action recognition and dynamic texture classification) show that the proposed approaches achieve considerable improvements in discrimination accuracy, in comparison to state-of-the-art methods such as kernelized Affine Hull Method and graph-embedding Grassmann discriminant analysis.

artificial intelligence, machine learning, manifold, (14 more...)

1401.8126

Country: Oceania > Australia > Queensland (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Posterior Contraction Rates of the Phylogenetic Indian Buffet Processes

Chen, Mengjie, Gao, Chao, Zhao, Hongyu

By expressing prior distributions as general stochastic processes, nonparametric Bayesian methods provide a flexible way to incorporate prior knowledge and constrain the latent structure in statistical inference. The Indian buffet process (IBP) is such an example that can be used to define a prior distribution on infinite binary features, where the exchangeability among subjects is assumed. The phylogenetic Indian buffet process (pIBP), a derivative of IBP, enables the modeling of non-exchangeability among subjects through a stochastic process on a rooted tree, which is similar to that used in phylogenetics, to describe relationships among the subjects. In this paper, we study the theoretical properties of IBP and pIBP under a binary factor model. We establish the posterior contraction rates for both IBP and pIBP and substantiate the theoretical results through simulation studies. This is the first work addressing the frequentist property of the posterior behaviors of IBP and pIBP. We also demonstrated its practical usefulness by applying pIBP prior to a real data example arising in the field of cancer genomics where the exchangeability among subjects is violated.

bayesian inference, oncology, pibp, (21 more...)

1307.8229

Country: North America > United States > North Carolina (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Fawzi, Alhussein, Sinn, Mathieu, Frossard, Pascal

Multi-task additive models with shared transfer functions based on dictionary learning

Additive models form a widely popular class of regression models which represent the relation between covariates and response variables as the sum of low-dimensional transfer functions. Besides flexibility and accuracy, a key benefit of these models is their interpretability: the transfer functions provide visual means for inspecting the models and identifying domain-specific relations between inputs and outputs. However, in large-scale problems involving the prediction of many related tasks, learning independently additive models results in a loss of model interpretability, and can cause overfitting when training data is scarce. We introduce a novel multi-task learning approach which provides a corpus of accurate and interpretable additive models for a large number of related forecasting tasks. Our key idea is to share transfer functions across models in order to reduce the model complexity and ease the exploration of the corpus. We establish a connection with sparse dictionary learning and propose a new efficient fitting algorithm which alternates between sparse coding and transfer function updates. The former step is solved via an extension of Orthogonal Matching Pursuit, whose properties are analyzed using a novel recovery condition which extends existing results in the literature. The latter step is addressed using a traditional dictionary update rule. Experiments on real-world data demonstrate that our approach compares favorably to baseline methods while yielding an interpretable corpus of models, revealing structure among the individual tasks and being more robust when training data is scarce. Our framework therefore extends the well-known benefits of additive models to common regression settings possibly involving thousands of tasks.

artificial intelligence, machine learning, transfer function, (18 more...)

1505.04966

Country:

Europe > Switzerland (0.14)
Europe > Ireland (0.14)

Genre: Research Report (0.64)

Industry: Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)