AITopics

Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional CF-based methods use the ratings given to items by users as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in many applications, causing CF-based methods to degrade significantly in their recommendation performance. To address this sparsity problem, auxiliary information such as item content information may be utilized. Collaborative topic regression (CTR) is an appealing recent method taking this approach which tightly couples the two components that learn from two different sources of information. Nevertheless, the latent representation learned by CTR may not be very effective when the auxiliary information is very sparse. To address this problem, we generalize recent advances in deep learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix. Extensive experiments on three real-world datasets from different domains show that CDL can significantly advance the state of the art.

cdl, deep learning, neural network, (23 more...)

1409.2944

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.94)
Information Technology (0.70)
Media > Film (0.69)
Health & Medicine (0.69)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Andreas, Jacob, Rabinovich, Maxim, Klein, Dan, Jordan, Michael I.

On the accuracy of self-normalized log-linear models

Calculation of the log-normalizer is a major computational obstacle in applications of log-linear models with large output spaces. The problem of fast normalizer computation has therefore attracted significant attention in the theoretical and applied machine learning literature. In this paper, we analyze a recently proposed technique known as "self-normalization", which introduces a regularization term in training to penalize log normalizers for deviating from zero. This makes it possible to use unnormalized model scores as approximate probabilities. Empirical evidence suggests that self-normalization is extremely effective, but a theoretical understanding of why it should work, and how generally it can be applied, is largely lacking. We prove generalization bounds on the estimated variance of normalizers and upper bounds on the loss in accuracy due to self-normalization, describe classes of input distributions that self-normalize easily, and construct explicit examples of high-variance input distributions. Our theoretical results make predictions about the difficulty of fitting self-normalized models to several classes of distributions, and we conclude with empirical validation of these predictions.

artificial intelligence, log-linear model, neural network, (18 more...)

1506.04147

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Nobandegani, Ardavan Salehi, Psaromiligkos, Ioannis N.

Multi-Context Models for Reasoning under Partial Knowledge: Generative Process and Inference Grammar

Arriving at the complete probabilistic knowledge of a domain, i.e., learning how all variables interact, is indeed a demanding task. In reality, settings often arise for which an individual merely possesses partial knowledge of the domain, and yet, is expected to give adequate answers to a variety of posed queries. That is, although precise answers to some queries, in principle, cannot be achieved, a range of plausible answers is attainable for each query given the available partial knowledge. In this paper, we propose the Multi-Context Model (MCM), a new graphical model to represent the state of partial knowledge as to a domain. MCM is a middle ground between Probabilistic Logic, Bayesian Logic, and Probabilistic Graphical Models. For this model we discuss: (i) the dynamics of constructing a contradiction-free MCM, i.e., to form partial beliefs regarding a domain in a gradual and probabilistically consistent way, and (ii) how to perform inference, i.e., to evaluate a probability of interest involving some variables of the domain.

artificial intelligence, knowledge, mcm, (18 more...)

1412.4271

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Huang, Furong, Anandkumar, Animashree

Convolutional Dictionary Learning through Tensor Factorization

Tensor methods have emerged as a powerful paradigm for consistent learning of many latent variable models such as topic models, independent component analysis and dictionary learning. Model parameters are estimated via CP decomposition of the observed higher order input moments. However, in many domains, additional invariances such as shift invariances exist, enforced via models such as convolutional dictionary learning. In this paper, we develop novel tensor decomposition algorithms for parameter estimation of convolutional models. Our algorithm is based on the popular alternating least squares method, but with efficient projections onto the space of stacked circulant matrices. Our method is embarrassingly parallel and consists of simple operations such as fast Fourier transforms and matrix multiplications. Our algorithm converges to the dictionary much faster and more accurately compared to the alternating minimization over filters and activation maps.

artificial intelligence, machine learning, matrix, (14 more...)

1506.03509

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

A simple application of FIC to model selection

Wiggins, Paul A.

We have recently proposed a new information-based approach to model selection, the Frequentist Information Criterion (FIC), that reconciles information-based and frequentist inference. The purpose of this current paper is to provide a simple example of the application of this criterion and a demonstration of the natural emergence of model complexities with both AIC-like ($N^0$) and BIC-like ($\log N$) scaling with observation number $N$. The application developed is deliberately simplified to make the analysis analytically tractable.

artificial intelligence, complexity, machine learning, (14 more...)

1506.06129

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Runge, Jakob, Donner, Reik V., Kurths, Jürgen

Optimal model-free prediction from multivariate time series

Forecasting a time series from multivariate predictors constitutes a challenging problem, especially using model-free approaches. Most techniques, such as nearest-neighbor prediction, quickly suffer from the curse of dimensionality and overfitting for more than a few predictors which has limited their application mostly to the univariate case. Therefore, selection strategies are needed that harness the available information as efficiently as possible. Since often the right combination of predictors matters, ideally all subsets of possible predictors should be tested for their predictive power, but the exponentially growing number of combinations makes such an approach computationally prohibitive. Here a prediction scheme that overcomes this strong limitation is introduced utilizing a causal pre-selection step which drastically reduces the number of possible predictors to the most predictive set of causal drivers making a globally optimal search scheme tractable. The information-theoretic optimality is derived and practical selection criteria are discussed. As demonstrated for multivariate nonlinear stochastic delay processes, the optimal scheme can even be less computationally expensive than commonly used sub-optimal schemes like forward selection. The method suggests a general framework to apply the optimal model-free approach to select variables and subsequently fit a model to further improve a prediction or learn statistical dependencies. The performance of this framework is illustrated on a climatological index of El Ni\~no Southern Oscillation.

artificial intelligence, machine learning, predictor, (19 more...)

doi: 10.1103/PhysRevE.91.052909

1506.05822

Country:

Europe > Germany (0.46)
North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Asteris, Megasthenis, Kyrillidis, Anastasios, Dimakis, Alexandros G., and, Han-Gyol Yi, Chandrasekaran, Bharath

Stay on path: PCA along graph paths

We introduce a variant of (sparse) PCA in which the set of feasible support sets is determined by a graph. In particular, we consider the following setting: given a directed acyclic graph $G$ on $p$ vertices corresponding to variables, the non-zero entries of the extracted principal component must coincide with vertices lying along a path in $G$. From a statistical perspective, information on the underlying network may potentially reduce the number of observations required to recover the population principal component. We consider the canonical estimator which optimally exploits the prior knowledge by solving a non-convex quadratic maximization on the empirical covariance. We introduce a simple network and analyze the estimator under the spiked covariance model. We show that side information potentially improves the statistical complexity. We propose two algorithms to approximate the solution of the constrained quadratic maximization, and recover a component with the desired properties. We empirically evaluate our schemes on synthetic and real datasets.

neural network, neurology, vertex, (20 more...)

1506.02344

Country:

North America > United States > Texas (0.14)
Europe > France (0.14)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.94)
Banking & Finance (0.67)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Wahlström, Niklas, Schön, Thomas B., Deisenroth, Marc Peter

From Pixels to Torques: Policy Learning with Deep Dynamical Models

Data-efficient learning in continuous state-action spaces using very high-dimensional observations remains a key challenge in developing fully autonomous systems. In this paper, we consider one instance of this challenge, the pixels to torques problem, where an agent must learn a closed-loop control policy from pixel information only. We introduce a data-efficient, model-based reinforcement learning algorithm that learns such a closed-loop policy directly from pixel information. The key ingredient is a deep dynamical model that uses deep auto-encoders to learn a low-dimensional embedding of images jointly with a predictive model in this low-dimensional feature space. Joint learning ensures that not only static but also dynamic properties of the data are accounted for. This is crucial for long-term predictions, which lie at the core of the adaptive model predictive control strategy that we use for closed-loop control. Compared to state-of-the-art reinforcement learning methods for continuous states and actions, our approach learns quickly, scales to high-dimensional state spaces and is an important step toward fully autonomous learning from pixels to torques.

deep learning, neural network, upstream oil & gas, (20 more...)

1502.02251

Country: Europe > Sweden (0.28)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Linderman, Scott W., Johnson, Matthew J., Adams, Ryan P.

Dependent Multinomial Models Made Easy: Stick Breaking with the P\'olya-Gamma Augmentation

Many practical modeling problems involve discrete data that are best represented as draws from multinomial or categorical distributions. For example, nucleotides in a DNA sequence, children's names in a given state and year, and text documents are all commonly modeled with multinomial distributions. In all of these cases, we expect some form of dependency between the draws: the nucleotide at one position in the DNA strand may depend on the preceding nucleotides, children's names are highly correlated from year to year, and topics in text may be correlated and dynamic. These dependencies are not naturally captured by the typical Dirichlet-multinomial formulation. Here, we leverage a logistic stick-breaking representation and recent innovations in P\'olya-gamma augmentation to reformulate the multinomial distribution in terms of latent variables with jointly Gaussian likelihoods, enabling us to take advantage of a host of Bayesian inference techniques for Gaussian models with minimal overhead.

bayesian inference, health & medicine, olya-gamma augmentation, (18 more...)

1506.05843

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

A tree augmented naive Bayesian network experiment for breast cancer prediction

Ren, Ping

In order to investigate the breast cancer prediction problem on the aging population with the grades of DCIS, we conduct a tree augmented naive Bayesian network experiment trained and tested on a large clinical dataset including consecutive diagnostic mammography examinations, consequent biopsy outcomes and related cancer registry records in the population of women across all ages. The aggregated results of our ten-fold cross validation method recommend a biopsy threshold higher than 2% for the aging population.

bayesian inference, biopsy, oncology, (16 more...)

1506.05776

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)