AITopics

1211.1328

Country: North America > United States (0.93)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)

arXiv.org Machine LearningSep-30-2013

Structured learning of sum-of-submodular higher order energy functions

Fix, Alexander, Joachims, Thorsten, Park, Sam, Zabih, Ramin

Submodular functions can be exactly minimized in polynomial time, and the special case that graph cuts solve with max flow [18] has had significant impact in computer vision [5, 20, 27]. In this paper we address the important class of sum-of-submodular (SoS) functions [2, 17], which can be efficiently minimized via a variant of max flow called submodular flow [6]. SoS functions can naturally express higher order priors involving, e.g., local image patches; however, it is difficult to fully exploit their expressive power because they have so many parameters. Rather than trying to formulate existing higher order priors as an SoS function, we take a discriminative learning approach, effectively searching the space of SoS functions for a higher order prior that performs well on our training set. We adopt a structural SVM approach [14, 33] and formulate the training problem in terms of quadratic programming; as a result we can efficiently search the space of SoS priors via an extended cutting-plane algorithm. We also show how the state-of-the-art max flow method for vision problems [10] can be modified to efficiently solve the submodular flow problem. Experimental comparisons are made against the OpenCV implementation of the GrabCut interactive segmentation technique [27], which uses hand-tuned parameters instead of machine learning. On a standard dataset [11] our method learns higher order priors with hundreds of parameter values, and produces significantly better segmentations. While our focus is on binary labeling problems, we show that our techniques can be naturally generalized to handle more than two labels.

algorithm, artificial intelligence, machine learning, (19 more...)

1309.7512

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)

arXiv.org Machine LearningSep-29-2013

Fast Marginalized Block Sparse Bayesian Learning Algorithm

Liu, Benyuan, Zhang, Zhilin, Fan, Hongqi, Fu, Qiang

The performance of sparse signal recovery from noise corrupted, underdetermined measurements can be improved if both sparsity and correlation structure of signals are exploited. One typical correlation structure is the intra-block correlation in block sparse signals. To exploit this structure, a framework, called block sparse Bayesian learning (BSBL), has been proposed recently. Algorithms derived from this framework showed superior performance but they are not very fast, which limits their applications. This work derives an efficient algorithm from this framework, using a marginalized likelihood maximization method. Compared to existing BSBL algorithms, it has close recovery performance but is much faster. Therefore, it is more suitable for large scale datasets and applications requiring real-time implementation.

artificial intelligence, bayesian inference, machine learning, (16 more...)

1211.4909

Country:

Asia > China (0.04)
North America > United States > Texas > Dallas County > Richardson (0.04)
North America > United States > Florida > Monroe County > Key West (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.86)

arXiv.org Machine LearningSep-29-2013

Error AMP Chain Graphs

Peña, Jose M.

Any regular Gaussian probability distribution that can be represented by an AMP chain graph (CG) can be expressed as a system of linear equations with correlated errors whose structure depends on the CG. However, the CG represents the errors implicitly, as no nodes in the CG correspond to the errors. We propose in this paper to add some deterministic nodes to the CG in order to represent the errors explicitly. We call the result an EAMP CG. We will show that, as desired, every AMP CG is Markov equivalent to its corresponding EAMP CG under marginalization of the error nodes. We will also show that every EAMP CG under marginalization of the error nodes is Markov equivalent to some LWF CG under marginalization of the error nodes, and that the latter is Markov equivalent to some directed and acyclic graph (DAG) under marginalization of the error nodes and conditioning on some selection nodes. This is important because it implies that the independence model represented by an AMP CG can be accounted for by some data generating process that is partially observed and has selection bias. Finally, we will show that EAMP CGs are closed under marginalization. This is a desirable feature because it guarantees parsimonious models under marginalization.

artificial intelligence, machine learning, node, (18 more...)

1306.6843

Country: Europe (0.46)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

arXiv.org Machine LearningSep-29-2013

An upper bound on prototype set size for condensed nearest neighbor

Christiansen, Eric

The nearest neighbor (NN) rule assigns to an unclassified point the class of a closest point from a set of prototypical points. The NN algorithm stores every training point as a prototypical point and classifies new points according to the NN rule. A nice property is that, for arbitrary class distributions, as the number of training points goes to infinity, the error of the rule produced by the NN algorithm converges to within twice the Bayes error [4]. Unfortunately, storing every training point as a prototypical point can be impractical for huge training sets in terms of both memory complexity and the time complexity of classifying according to the NN rule. As a result, many techniques exist for reducing the size of the set of prototypical points.

algorithm, artificial intelligence, machine learning, (17 more...)

1309.7676

Country: North America > United States (0.29)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.67)

Orchard, Peter, Agakov, Felix, Storkey, Amos

Bayesian Inference in Sparse Gaussian Graphical Models

One of the fundamental tasks of science is to find explainable relationships between observed phenomena. One approach to this task that has received attention in recent years is based on probabilistic graphical modelling with sparsity constraints on model structures. In this paper, we describe two new approaches to Bayesian inference of sparse structures of Gaussian graphical models (GGMs). One is based on a simple modification of the cutting-edge block Gibbs sampler for sparse GGMs, which results in significant computational gains in high dimensions. The other method is based on a specific construction of the Hamiltonian Monte Carlo sampler, which results in further significant improvements. We compare our fully Bayesian approaches with the popular regularisation-based graphical LASSO, and demonstrate significant advantages of the Bayesian treatment under the same computing costs. We apply the methods to a broad range of simulated data sets, and a real-life financial data set.

artificial intelligence, machine learning, sampler, (13 more...)

doi: 10.1017/S0956796814000057

1309.7311

Genre: Research Report (0.50)

Industry: Banking & Finance (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Zeng, Xiangrong, Figueiredo, Mário A. T.

Solving OSCAR regularization problems by proximal splitting algorithms

The OSCAR (octagonal selection and clustering algorithm for regression) regularizer consists of a L_1 norm plus a pair-wise L_inf norm (responsible for its grouping behavior) and was proposed to encourage group sparsity in scenarios where the groups are a priori unknown. The OSCAR regularizer has a non-trivial proximity operator, which limits its applicability. We reformulate this regularizer as a weighted sorted L_1 norm, and propose its grouping proximity operator (GPO) and approximate proximity operator (APO), thus making state-of-the-art proximal splitting algorithms (PSAs) available to solve inverse problems with OSCAR regularization. The GPO is in fact the APO followed by additional grouping and averaging operations, which are costly in time and storage, explaining the reason why algorithms with APO are much faster than that with GPO. The convergences of PSAs with GPO are guaranteed since GPO is an exact proximity operator. Although convergence of PSAs with APO is may not be guaranteed, we have experimentally found that APO behaves similarly to GPO when the regularization parameter of the pair-wise L_inf norm is set to an appropriately small value. Experiments on recovery of group-sparse signals (with unknown groups) show that PSAs with APO are very fast and accurate.

algorithm, artificial intelligence, machine learning, (17 more...)

1309.6301

Country: Europe (0.46)

Genre: Research Report (0.64)

Industry:

Media > Film (0.55)
Leisure & Entertainment (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Pérez-Cruz, Fernando, Van Vaerenbergh, Steven, Murillo-Fuentes, Juan José, Lázaro-Gredilla, Miguel, Santamaria, Ignacio

Gaussian Processes for Nonlinear Signal Processing

Gaussian processes (GPs) are Bayesian state-of-the-art tools for discriminative machine learning, i.e., regression [1], classification [2] and dimensionality reduction [3]. GPs were first proposed in statistics by Tony O'Hagan [4] and they are well-known to the geostatistics community as kriging. However, due to their high computational complexity they did not become widely applied tools in machine learning until the early XXI century [5]. GPs can be interpreted as a family of kernel methods with the additional advantage of providing a full conditional statistical description for the predicted variable, which can be primarily used to establish confidence intervals and to set hyper-parameters. In a nutshell, Gaussian processes assume that a Gaussian process prior governs the set of possible latent functions (which are unobserved), and the likelihood (of the latent function) and observations shape this prior to produce posterior probabilistic estimates.

artificial intelligence, machine learning, modeling & simulation, (17 more...)

doi: 10.1109/MSP.2013.2250352

1303.2823

Country:

North America > United States (0.46)
Europe > Spain (0.28)

Genre:

Research Report (0.64)
Instructional Material (0.46)

Industry: Energy (0.34)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Colombo, Diego, Maathuis, Marloes H.

Order-independent constraint-based causal structure learning

We consider constraint-based methods for causal structure learning, such as the PC-, FCI-, RFCI- and CCD- algorithms (Spirtes et al. (2000, 1993), Richardson (1996), Colombo et al. (2012), Claassen et al. (2013)). The first step of all these algorithms consists of the PC-algorithm. This algorithm is known to be order-dependent, in the sense that the output can depend on the order in which the variables are given. This order-dependence is a minor issue in low-dimensional settings. We show, however, that it can be very pronounced in high-dimensional settings, where it can lead to highly variable results. We propose several modifications of the PC-algorithm (and hence also of the other algorithms) that remove part or all of this order-dependence. All proposed modifications are consistent in high-dimensional settings under the same conditions as their original counterparts. We compare the PC-, FCI-, and RFCI-algorithms and their modifications in simulation studies and on a yeast gene expression data set. We show that our modifications yield similar performance in low-dimensional settings and improved performance in high-dimensional settings. All software is implemented in the R-package pcalg.

algorithm, artificial intelligence, machine learning, (15 more...)

1211.3295

Country:

North America > United States (0.45)
Europe (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Diffusion map for clustering fMRI spatial maps extracted by independent component analysis

Sipola, Tuomo, Cong, Fengyu, Ristaniemi, Tapani, Alluri, Vinoo, Toiviainen, Petri, Brattico, Elvira, Nandi, Asoke K.

Functional magnetic resonance imaging (fMRI) produces data about activity inside the brain, from which spatial maps can be extracted by independent component analysis (ICA). In datasets, there are n spatial maps that contain p voxels. The number of voxels is very high compared to the number of analyzed spatial maps. Clustering of the spatial maps is usually based on correlation matrices. This usually works well, although such a similarity matrix inherently can explain only a certain amount of the total variance contained in the high-dimensional data where n is relatively small but p is large. For high-dimensional space, it is reasonable to perform dimensionality reduction before clustering. In this research, we used the recently developed diffusion map for dimensionality reduction in conjunction with spectral clustering. This research revealed that the diffusion map based clustering worked as well as the more traditional methods, and produced more compact clusters when needed.

artificial intelligence, data mining, machine learning, (16 more...)

doi: 10.1109/MLSP.2013.6661923

1306.135

Country:

Europe (0.95)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.71)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.96)