AITopics

1504.0287

Country: Asia > Japan (0.14)

Genre: Research Report (1.00)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)

arXiv.org Machine LearningApr-11-2015

Generalized Correntropy for Robust Adaptive Filtering

Chen, Badong, Xing, Lei, Zhao, Haiquan, Zheng, Nanning, Príncipe, José C.

As a robust nonlinear similarity measure in kernel space, correntropy has received increasing attention in domains of machine learning and signal processing. In particular, the maximum correntropy criterion (MCC) has recently been successfully applied in robust regression and filtering. The default kernel function in correntropy is the Gaussian kernel, which is, of course, not always the best choice. In this work, we propose a generalized correntropy that adopts the generalized Gaussian density (GGD) function as the kernel (not necessarily a Mercer kernel), and present some important properties. We further propose the generalized maximum correntropy criterion (GMCC), and apply it to adaptive filtering. An adaptive algorithm, called the GMCC algorithm, is derived, and the mean square convergence performance is studied. We show that the proposed algorithm is very stable and can achieve zero probability of divergence (POD). Simulation results confirm the theoretical expectations and demonstrate the desirable performance of the new algorithm.

algorithm, artificial intelligence, machine learning, (18 more...)

doi: 10.1109/TSP.2016.2539127

1504.02931

Country:

Asia > China (0.46)
North America > United States > Florida > Alachua County > Gainesville (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Belanich, Joshua, Ortiz, Luis E.

On the Convergence Properties of Optimal AdaBoost

arXiv.org Artificial IntelligenceApr-11-2015

AdaBoost is one of the most popular machine-learning algorithms. It is simple to implement and often found very effective by practitioners, while still being mathematically elegant and theoretically sound. AdaBoost's behavior in practice, and in particular the test-error behavior, has puzzled many eminent researchers for over a decade: It seems to defy our general intuition in machine learning regarding the fundamental trade-off between model complexity and generalization performance. In this paper, we establish the convergence of "Optimal AdaBoost," a term coined by Rudin, Daubechies, and Schapire in 2004. We prove the convergence, with the number of rounds, of the classifier itself, its generalization error, and its resulting margins for fixed data sets, under certain reasonable conditions. More generally, we prove that the time/per-round average of almost any function of the example weights converges. Our approach is to frame AdaBoost as a dynamical system, to provide sufficient conditions for the existence of an invariant measure, and to employ tools from ergodic theory. Unlike previous work, we do not assume AdaBoost cycles; actually, we present empirical evidence against it on real-world datasets. Our main theoretical results hold under a weaker condition. We show sufficient empirical evidence that Optimal AdaBoost always met the condition on every real-world dataset we tried. Our results formally ground future convergence-rate analyses, and may even provide opportunities for slight algorithmic modifications to optimize the generalization ability of AdaBoost classifiers, thus reducing a practitioner's burden of deciding how long to run the algorithm.

adaboost, health & medicine, survey article, (18 more...)

arXiv.org Artificial Intelligence

1212.1108

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Cho, Hyunghoon, Berger, Bonnie, Peng, Jian

Diffusion Component Analysis: Unraveling Functional Topology in Biological Networks

Complex biological systems have been successfully modeled by biochemical and genetic interaction networks, typically gathered from high-throughput (HTP) data. These networks can be used to infer functional relationships between genes or proteins. Using the intuition that the topological role of a gene in a network relates to its biological function, local or diffusion based "guilt-by-association" and graph-theoretic methods have had success in inferring gene functions. Here we seek to improve function prediction by integrating diffusion-based methods with a novel dimensionality reduction technique to overcome the incomplete and noisy nature of network data. In this paper, we introduce diffusion component analysis (DCA), a framework that plugs in a diffusion model and learns a low-dimensional vector representation of each node to encode the topological properties of a network. As a proof of concept, we demonstrate DCA's substantial improvement over state-of-the-art diffusion-based approaches in predicting protein function from molecular interaction networks. Moreover, our DCA framework can integrate multiple networks from heterogeneous sources, consisting of genomic information, biochemical experiments and other resources, to even further improve function prediction. Yet another layer of performance gain is achieved by integrating the DCA framework with support vector machines that take our node vector representations as features. Overall, our DCA framework provides a novel representation of nodes in a network that can be used as a plug-in architecture to other machine learning algorithms to decipher topological properties of and obtain novel insights into interactomes.

artificial intelligence, health & medicine, protein, (18 more...)

1504.02719

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Illinois (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.87)

Geras, Krzysztof J., Sutton, Charles

Scheduled denoising autoencoders

We present a representation learning method that learns features at multiple different levels of scale. Working within the unsupervised framework of denoising autoencoders, we observe that when the input is heavily corrupted during training, the network tends to learn coarse-grained features, whereas when the input is only slightly corrupted, the network tends to learn fine-grained features. This motivates the scheduled denoising autoencoder, which starts with a high level of noise that lowers as training progresses. We find that the resulting representation yields a significant boost on a later supervised task compared to the original input, or to a standard denoising autoencoder trained at a single noise level. After supervised fine-tuning our best model achieves the lowest ever reported error on the CIFAR-10 data set among permutation-invariant methods.

deep learning, neural network, noise level, (19 more...)

1406.3269

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Gradient of Probability Density Functions based Contrasts for Blind Source Separation (BSS)

C, Dharmani Bhaveshkumar

The article derives some novel independence measures and contrast functions for Blind Source Separation (BSS) application. For the $k^{th}$ order differentiable multivariate functions with equal hyper-volumes (region bounded by hyper-surfaces) and with a constraint of bounded support for $k>1$, it proves that equality of any $k^{th}$ order derivatives implies equality of the functions. The difference between product of marginal Probability Density Functions (PDFs) and joint PDF of a random vector is defined as Function Difference (FD) of a random vector. Assuming the PDFs are $k^{th}$ order differentiable, the results on generalized functions are applied to the independence condition. This brings new sets of independence measures and BSS contrasts based on the $L^p$-Norm, $ p \geq 1$ of - FD, gradient of FD (GFD) and Hessian of FD (HFD). Instead of a conventional two stage indirect estimation method for joint PDF based BSS contrast estimation, a single stage direct estimation of the contrasts is desired. The article targets both the efficient estimation of the proposed contrasts and extension of the potential theory for an information field. The potential theory has a concept of reference potential and it is used to derive closed form expression for the relative analysis of potential field. Analogous to it, there are introduced concepts of Reference Information Potential (RIP) and Cross Reference Information Potential (CRIP) based on the potential due to kernel functions placed at selected sample points as basis in kernel methods. The quantities are used to derive closed form expressions for information field analysis using least squares. The expressions are used to estimate $L^2$-Norm of FD and $L^2$-Norm of GFD based contrasts.

artificial intelligence, estimation, machine learning, (14 more...)

1504.02712

Country:

Asia > Middle East > Iran (0.14)
Asia > India > Gujarat (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (0.54)

Croteau, Nicole, Nathoo, Farouk S., Cao, Jiguo, Budney, Ryan

High-Dimensional Classification for Brain Decoding

Brain decoding involves the determination of a subject's cognitive state or an associated stimulus from functional neuroimaging data measuring brain activity. In this setting the cognitive state is typically characterized by an element of a finite set, and the neuroimaging data comprise voluminous amounts of spatiotemporal data measuring some aspect of the neural signal. The associated statistical problem is one of classification from high-dimensional data. We explore the use of functional principal component analysis, mutual information networks, and persistent homology for examining the data through exploratory analysis and for constructing features characterizing the neural signal for brain decoding. We review each approach from this perspective, and we incorporate the features into a classifier based on symmetric multinomial logistic regression with elastic net regularization. The approaches are illustrated in an application where the task is to infer, from brain activity measured with magnetoencephalography (MEG), the type of video stimulus shown to a subject.

classifier, neurology, survey article, (22 more...)

1504.028

Country: North America > United States (0.14)

Genre:

Research Report > Experimental Study (0.49)
Research Report > New Finding (0.35)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Deep Narrow Boltzmann Machines are Universal Approximators

Montufar, Guido

We show that deep narrow Boltzmann machines are universal approximators of probability distributions on the activities of their visible units, provided they have sufficiently many hidden layers, each containing the same number of units as the visible layer. We show that, within certain parameter domains, deep Boltzmann machines can be studied as feedforward networks. We provide upper and lower bounds on the sufficient depth and width of universal approximators. These results settle various intuitions regarding undirected networks and, in particular, they show that deep narrow Boltzmann machines are at least as compact universal approximators as narrow sigmoid belief networks and restricted Boltzmann machines, with respect to the currently available bounds for those models.

deep learning, neural network, probability distribution, (18 more...)

1411.3784

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Gribonval, Rémi, Jenatton, Rodolphe, Bach, Francis, Kleinsteuber, Martin, Seibert, Matthias

Sample Complexity of Dictionary Learning and other Matrix Factorizations

arXiv.org Machine LearningApr-9-2015

Many modern tools in machine learning and signal processing, such as sparse dictionary learning, principal component analysis (PCA), non-negative matrix factorization (NMF), $K$-means clustering, etc., rely on the factorization of a matrix obtained by concatenating high-dimensional vectors from a training collection. While the idealized task would be to optimize the expected quality of the factors over the underlying distribution of training vectors, it is achieved in practice by minimizing an empirical average over the considered collection. The focus of this paper is to provide sample complexity estimates to uniformly control how much the empirical average deviates from the expected cost function. Standard arguments imply that the performance of the empirical predictor also exhibit such guarantees. The level of genericity of the approach encompasses several possible constraints on the factors (tensor product structure, shift-invariance, sparsity \ldots), thus providing a unified perspective on the sample complexity of several widely used matrix factorization schemes. The derived generalization bounds behave proportional to $\sqrt{\log(n)/n}$ w.r.t.\ the number of samples $n$ for the considered matrix factorization techniques.

artificial intelligence, constraint, machine learning, (14 more...)

1312.379

Country:

Europe > France > Île-de-France (0.14)
Europe > Germany > Bavaria (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.50)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Machine LearningApr-9-2015

`local' vs. `global' parameters -- breaking the gaussian complexity barrier

Mendelson, Shahar

We show that if $F$ is a convex class of functions that is $L$-subgaussian, the error rate of learning problems generated by independent noise is equivalent to a fixed point determined by `local' covering estimates of the class, rather than by the gaussian averages. To that end, we establish new sharp upper and lower estimates on the error rate for such problems.

artificial intelligence, machine learning, probability, (18 more...)

1504.02191

Country:

Asia > Middle East > Israel (0.14)
Oceania > Australia (0.14)

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)