Goto

Collaborating Authors

 Genre


Fused Multiple Graphical Lasso

arXiv.org Machine Learning

In this paper, we consider the problem of estimating multiple graphical models simultaneously using the fused lasso penalty, which encourages adjacent graphs to share similar structures. A motivating example is the analysis of brain networks of Alzheimer's disease using neuroimaging data. Specifically, we may wish to estimate a brain network for the normal controls (NC), a brain network for the patients with mild cognitive impairment (MCI), and a brain network for Alzheimer's patients (AD). We expect the two brain networks for NC and MCI to share common structures but not to be identical to each other; similarly for the two brain networks for MCI and AD. The proposed formulation can be solved using a second-order method. Our key technical contribution is to establish the necessary and sufficient condition for the graphs to be decomposable. Based on this key property, a simple screening rule is presented, which decomposes the large graphs into small subgraphs and allows an efficient estimation of multiple independent (small) subgraphs, dramatically reducing the computational cost. We perform experiments on both synthetic and real data; our results demonstrate the effectiveness and efficiency of the proposed approach.


Assessment of Customer Credit through Combined Clustering of Artificial Neural Networks, Genetics Algorithm and Bayesian Probabilities

arXiv.org Artificial Intelligence

Today, with respect to the increasing growth of demand to get credit from the customers of banks and finance and credit institutions, using an effective and efficient method to decrease the risk of non-repayment of credit given is very necessary. Assessment of customers' credit is one of the most important and the most essential duties of banks and institutions, and if an error occurs in this field, it would leads to the great losses for banks and institutions. Thus, using the predicting computer systems has been significantly progressed in recent decades. The data that are provided to the credit institutions' managers help them to make a straight decision for giving the credit or not-giving it. In this paper, we will assess the customer credit through a combined classification using artificial neural networks, genetics algorithm and Bayesian probabilities simultaneously, and the results obtained from three methods mentioned above would be used to achieve an appropriate and final result. We use the K_folds cross validation test in order to assess the method and finally, we compare the proposed method with the methods such as Clustering-Launched Classification (CLC), Support Vector Machine (SVM) as well as GA SVM where the genetics algorithm has been used to improve them. Keywords-Data classification; Combined Clustring; Artificial Neural Networks; Genetics Algorithm; Bayyesian Probabilities.


Towards a New Science of a Clinical Data Intelligence

arXiv.org Artificial Intelligence

In this paper we define Clinical Data Intelligence as the analysis of data generated in the clinical routine with the goal of improving patient care. We define a science of a Clinical Data Intelligence as a data analysis that permits the derivation of scientific, i.e., generalizable and reliable results. We argue that a science of a Clinical Data Intelligence is sensible in the context of a Big Data analysis, i.e., with data from many patients and with complete patient information. We discuss that Clinical Data Intelligence requires the joint efforts of knowledge engineering, information extraction (from textual and other unstructured data), and statistics and statistical machine learning. We describe some of our main results as conjectures and relate them to a recently funded research project involving two major German university hospitals.


A Fast Hadamard Transform for Signals with Sub-linear Sparsity in the Transform Domain

arXiv.org Machine Learning

A new iterative low complexity algorithm has been presented for computing the Walsh-Hadamard transform (WHT) of an $N$ dimensional signal with a $K$-sparse WHT, where $N$ is a power of two and $K = O(N^\alpha)$, scales sub-linearly in $N$ for some $0 < \alpha < 1$. Assuming a random support model for the non-zero transform domain components, the algorithm reconstructs the WHT of the signal with a sample complexity $O(K \log_2(\frac{N}{K}))$, a computational complexity $O(K\log_2(K)\log_2(\frac{N}{K}))$ and with a very high probability asymptotically tending to 1. The approach is based on the subsampling (aliasing) property of the WHT, where by a carefully designed subsampling of the time domain signal, one can induce a suitable aliasing pattern in the transform domain. By treating the aliasing patterns as parity-check constraints and borrowing ideas from erasure correcting sparse-graph codes, the recovery of the non-zero spectral values has been formulated as a belief propagation (BP) algorithm (peeling decoding) over a sparse-graph code for the binary erasure channel (BEC). Tools from coding theory are used to analyze the asymptotic performance of the algorithm in the very sparse ($\alpha\in(0,\frac{1}{3}]$) and the less sparse ($\alpha\in(\frac{1}{3},1)$) regime.


A General Algorithm for Deciding Transportability of Experimental Results

arXiv.org Machine Learning

Generalizing empirical findings to new environments, settings, or populations is essential in most scientific explorations. This article treats a particular problem of generalizability, called "transportability", defined as a license to transfer information learned in experimental studies to a different population, on which only observational studies can be conducted. Given a set of assumptions concerning commonalities and differences between the two populations, Pearl and Bareinboim (2011) derived sufficient conditions that permit such transfer to take place. This article summarizes their findings and supplements them with an effective procedure for deciding when and how transportability is feasible. It establishes a necessary and sufficient condition for deciding when causal effects in the target population are estimable from both the statistical information available and the causal information transferred from the experiments. The article further provides a complete algorithm for computing the transport formula, that is, a way of combining observational and experimental information to synthesize bias-free estimate of the desired causal relation. Finally, the article examines the differences between transportability and other variants of generalizability.


Generalized Ambiguity Decomposition for Understanding Ensemble Diversity

arXiv.org Machine Learning

Diversity or complementarity of experts in ensemble pattern recognition and information processing systems is widely-observed by researchers to be crucial for achieving performance improvement upon fusion. Understanding this link between ensemble diversity and fusion performance is thus an important research question. However, prior works have theoretically characterized ensemble diversity and have linked it with ensemble performance in very restricted settings. We present a generalized ambiguity decomposition (GAD) theorem as a broad framework for answering these questions. The GAD theorem applies to a generic convex ensemble of experts for any arbitrary twice-differentiable loss function. It shows that the ensemble performance approximately decomposes into a difference of the average expert performance and the diversity of the ensemble. It thus provides a theoretical explanation for the empirically-observed benefit of fusing outputs from diverse classifiers and regressors. It also provides a loss function-dependent, ensemble-dependent, and data-dependent definition of diversity. We present extensions of this decomposition to common regression and classification loss functions, and report a simulation-based analysis of the diversity term and the accuracy of the decomposition. We finally present experiments on standard pattern recognition data sets which indicate the accuracy of the decomposition for real-world classification and regression problems.


Proceedings of Answer Set Programming and Other Computing Paradigms (ASPOCP 2013), 6th International Workshop, August 25, 2013, Istanbul, Turkey

arXiv.org Artificial Intelligence

This volume contains the papers presented at the sixth workshop on Answer Set Programming and Other Computing Paradigms (ASPOCP 2013) held on August 25th, 2013 in Istanbul, co-located with the 29th International Conference on Logic Programming (ICLP 2013). It thus continues a series of previous events co-located with ICLP, aiming at facilitating the discussion about crossing the boundaries of current ASP techniques in theory, solving, and applications, in combination with or inspired by other computing paradigms.


lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

arXiv.org Machine Learning

The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be improved in the sense that the number of samples required to identify the best arm is within a constant factor of a lower bound based on the law of the iterated logarithm (LIL). Inspired by the LIL, we construct our confidence bounds to explicitly account for the infinite time horizon of the algorithm. In addition, by using a novel stopping time for the algorithm we avoid a union bound over the arms that has been observed in other UCB-type algorithms. We prove that the algorithm is optimal up to constants and also show through simulations that it provides superior performance with respect to the state-of-the-art.


A Comprehensive Approach to Universal Piecewise Nonlinear Regression Based on Trees

arXiv.org Machine Learning

Abstract--In this paper, we investigate adaptive nonlinear regression and introduce tree based piecewise linear regression algorithms that are highly efficient and provide significantly improved performance with guaranteed upper bounds in an individual sequence manner. We use a tree notion in order to partition the space of regressors in a nested structure. The introduced algorithms adapt not only their regression functions but also the complete tree structure while achieving the performance of the "best" linear mixture of a doubly exponential number of partitions, with a computational complexity only polynomial in the number of nodes of the tree. While constructing these algorithms, we also avoid using any artificial "weighting" of models (with highly data dependent parameters) and, instead, directly minimize the final regression error, which is the ultimate performance goal. The introduced methods are generic such that they can readily incorporate different tree construction methods such as random trees in their framework and can use different regressor or partitioning functions as demonstrated in the paper. ONLINEAR adaptive filtering and regression are extensively investigated in the signal processing [1]-[19] and machine learning literatures [20]-[23], especially for applications where linear modeling [24], [25] is inadequate, hence, does not provide satisfactory results due to the structural constraint on linearity. Although nonlinear approaches can be more powerful than linear methods in modeling, they usually suffer from overfitting, stability and convergence issues [1], [26]-[28], which considerably limit their application to signal processing problems. These issues are especially exacerbated in adaptive filtering due to the presence of feedback, which is even hard to control for linear models [26], [27], [29]. Furthermore, for applications involving big data, which require to process input vectors with considerably large dimensions, nonlinear models are usually avoided due to unmanageable computational complexity increase [30].


Positive definite matrices and the S-divergence

arXiv.org Machine Learning

Positive definite matrices abound in a dazzling variety of applications. This ubiquity can be in part attributed to their rich geometric structure: positive definite matrices form a self-dual convex cone whose strict interior is a Riemannian manifold. The manifold view is endowed with a "natural" distance function while the conic view is not. Nevertheless, drawing motivation from the conic view, we introduce the S-Divergence as a "natural" distance-like function on the open cone of positive definite matrices. We motivate the S-divergence via a sequence of results that connect it to the Riemannian distance. In particular, we show that (a) this divergence is the square of a distance; and (b) that it has several geometric properties similar to those of the Riemannian distance, though without being computationally as demanding. The S-divergence is even more intriguing: although nonconvex, we can still compute matrix means and medians using it to global optimality. We complement our results with some numerical experiments illustrating our theorems and our optimization algorithm for computing matrix medians.