AITopics

In this thesis, we propose several modelling strategies to tackle evolving data in different contexts. In the framework of static clustering, we start by introducing a soft kernel spectral clustering (SKSC) algorithm, which can better deal with overlapping clusters with respect to kernel spectral clustering (KSC) and provides more interpretable outcomes. Afterwards, a whole strategy based upon KSC for community detection of static networks is proposed, where the extraction of a high quality training sub-graph, the choice of the kernel function, the model selection and the applicability to large-scale data are key aspects. This paves the way for the development of a novel clustering algorithm for the analysis of evolving networks called kernel spectral clustering with memory effect (MKSC), where the temporal smoothness between clustering results in successive time steps is incorporated at the level of the primal optimization problem, by properly modifying the KSC formulation. Later on, an application of KSC to fault detection of an industrial machine is presented. Here, a smart pre-processing of the data by means of a proper windowing operation is necessary to catch the ongoing degradation process affecting the machine. In this way, in a genuinely unsupervised manner, it is possible to raise an early warning when necessary, in an online fashion. Finally, we propose a new algorithm called incremental kernel spectral clustering (IKSC) for online learning of non-stationary data. This ambitious challenge is faced by taking advantage of the out-of-sample property of kernel spectral clustering (KSC) to adapt the initial model, in order to tackle merging, splitting or drifting of clusters across time. Real-world applications considered in this thesis include image segmentation, time-series clustering, community detection of static and evolving networks.

artificial intelligence, data mining, machine learning, (18 more...)

1411.5988

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > Experimental Study (0.92)

Industry:

Energy (0.92)
Education > Educational Setting (0.87)
Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

A Joint Probabilistic Classification Model of Relevant and Irrelevant Sentences in Mathematical Word Problems

Cetintas, Suleyman, Si, Luo, Xin, Yan Ping, Zhang, Dake, Park, Joo Young, Tzur, Ron

Estimating the difficulty level of math word problems is an important task for many educational applications. Identification of relevant and irrelevant sentences in math word problems is an important step for calculating the difficulty levels of such problems. This paper addresses a novel application of text categorization to identify two types of sentences in mathematical word problems, namely relevant and irrelevant sentences. A novel joint probabilistic classification model is proposed to estimate the joint probability of classification decisions for all sentences of a math word problem by utilizing the correlation among all sentences along with the correlation between the question sentence and other sentences, and sentence text. The proposed model is compared with i) a SVM classifier which makes independent classification decisions for individual sentences by only using the sentence text and ii) a novel SVM classifier that considers the correlation between the question sentence and other sentences along with the sentence text. An extensive set of experiments demonstrates the effectiveness of the joint probabilistic classification model for identifying relevant and irrelevant sentences as well as the novel SVM classifier that utilizes the correlation between the question sentence and other sentences. Furthermore, empirical results and analysis show that i) it is highly beneficial not to remove stopwords and ii) utilizing part of speech tagging does not make a significant improvement although it has been shown to be effective for the related task of math word problem type classification.

artificial intelligence, machine learning, word problem, (18 more...)

1411.5732

Country:

North America > United States > Indiana (0.28)
North America > United States > Colorado (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Predicting the Future Behavior of a Time-Varying Probability Distribution

Lampert, Christoph H.

We study the problem of predicting the future, though only in the probabilistic sense of estimating a future state of a time-varying probability distribution. This is not only an interesting academic problem, but solving this extrapolation problem also has many practical application, e.g. for training classifiers that have to operate under time-varying conditions. Our main contribution is a method for predicting the next step of the time-varying distribution from a given sequence of sample sets from earlier time steps. For this we rely on two recent machine learning techniques: embedding probability distributions into a reproducing kernel Hilbert space, and learning operators by vector-valued regression. We illustrate the working principles and the practical usefulness of our method by experiments on synthetic and real data. We also highlight an exemplary application: training a classifier in a domain adaptation setting without having access to examples from the test time distribution at training time.

artificial intelligence, machine learning, probability distribution, (19 more...)

1406.5362

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

$l_1$-regularized Outlier Isolation and Regression

Han, Sheng, Wang, Suzhen, Wu, Xinyu

This paper proposed a new regression model called $l_1$-regularized outlier isolation and regression (LOIRE) and a fast algorithm based on block coordinate descent to solve this model. Besides, assuming outliers are gross errors following a Bernoulli process, this paper also presented a Bernoulli estimate model which, in theory, should be very accurate and robust due to its complete elimination of affections caused by outliers. Though this Bernoulli estimate is hard to solve, it could be approximately achieved through a process which takes LOIRE as an important intermediate step. As a result, the approximate Bernoulli estimate is a good combination of Bernoulli estimate's accuracy and LOIRE regression's efficiency with several simulations conducted to strongly verify this point. Moreover, LOIRE can be further extended to realize robust rank factorization which is powerful in recovering low-rank component from massive corruptions. Extensive experimental results showed that the proposed method outperforms state-of-the-art methods like RPCA and GoDec in the aspect of computation speed with a competitive performance.

artificial intelligence, machine learning, matrix, (14 more...)

1406.0156

Country: Asia > China (0.29)

Genre:

Research Report > New Finding (0.54)
Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Kimes, Patrick K., Hayes, D. Neil, Marron, J. S., Liu, Yufeng

Large-Margin Classification with Multiple Decision Rules

arXiv.org Machine LearningNov-19-2014

Binary classification is a common statistical learning problem in which a model is estimated on a set of covariates for some outcome indicating the membership of one of two classes. In the literature, there exists a distinction between hard and soft classification. In soft classification, the conditional class probability is modeled as a function of the covariates. In contrast, hard classification methods only target the optimal prediction boundary. While hard and soft classification methods have been studied extensively, not much work has been done to compare the actual tasks of hard and soft classification. In this paper we propose a spectrum of statistical learning problems which span the hard and soft classification tasks based on fitting multiple decision rules to the data. By doing so, we reveal a novel collection of learning tasks of increasing complexity. We study the problems using the framework of large-margin classifiers and a class of piecewise linear convex surrogates, for which we derive statistical properties and a corresponding sub-gradient descent algorithm. We conclude by applying our approach to simulation settings and a magnetic resonance imaging (MRI) dataset from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study.

classification, loss function, soft classification, (13 more...)

1411.526

Country:

North America > United States > North Carolina > Orange County > Chapel Hill (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada (0.04)
(4 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Fernandez, Joseph A., Boddeti, Vishnu Naresh, Rodriguez, Andres, Kumar, B. V. K. Vijaya

Zero-Aliasing Correlation Filters for Object Recognition

arXiv.org Machine LearningNov-19-2014

Correlation filters (CFs) are a class of classifiers that are attractive for object localization and tracking applications. Traditionally, CFs have been designed in the frequency domain using the discrete Fourier transform (DFT), where correlation is efficiently implemented. However, existing CF designs do not account for the fact that the multiplication of two DFTs in the frequency domain corresponds to a circular correlation in the time/spatial domain. Because this was previously unaccounted for, prior CF designs are not truly optimal, as their optimization criteria do not accurately quantify their optimization intention. In this paper, we introduce new zero-aliasing constraints that completely eliminate this aliasing problem by ensuring that the optimization criterion for a given CF corresponds to a linear correlation rather than a circular correlation. This means that previous CF designs can be significantly improved by this reformulation. We demonstrate the benefits of this new CF design approach with several important CFs. We present experimental results on diverse data sets and present solutions to the computational challenges associated with computing these CFs. Code for the CFs described in this paper and their respective zero-aliasing versions is available at http://vishnu.boddeti.net/projects/correlation-filters.html

constraint, correlation, template, (16 more...)

doi: 10.1109/TPAMI.2014.2375215

1411.2316

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Ohio > Montgomery County > Dayton (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.46)
Education (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Heinonen, Markus, d'Alché-Buc, Florence

Learning nonparametric differential equations with operator-valued kernels and gradient matching

arXiv.org Machine LearningNov-19-2014

Modeling dynamical systems with ordinary differential equations implies a mechanistic view of the process underlying the dynamics. However in many cases, this knowledge is not available. To overcome this issue, we introduce a general framework for nonparametric ODE models using penalized regression in Reproducing Kernel Hilbert Spaces (RKHS) based on operator-valued kernels. Moreover, we extend the scope of gradient matching approaches to nonparametric ODE. A smooth estimate of the solution ODE is built to provide an approximation of the derivative of the ODE solution which is in turn used to learn the nonparametric ODE model. This approach benefits from the flexibility of penalized regression in RKHS allowing for ridge or (structured) sparse regression as well. Very good results are shown on 3 different ODE systems.

artificial intelligence, kernel, machine learning, (17 more...)

1411.5172

Country: Europe > France (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.68)

Mahani, Alireza S., Sharabiani, Mansour T. A.

SIMD Parallel MCMC Sampling with Applications for Big-Data Bayesian Analytics

arXiv.org Artificial IntelligenceNov-19-2014

Computational intensity and sequential nature of estimation techniques for Bayesian methods in statistics and machine learning, combined with their increasing applications for big data analytics, necessitate both the identification of potential opportunities to parallelize techniques such as MCMC sampling, and the development of general strategies for mapping such parallel algorithms to modern CPUs in order to elicit the performance up the compute-based and/or memory-based hardware limits. Two opportunities for Single-Instruction Multiple-Data (SIMD) parallelization of MCMC sampling for probabilistic graphical models are presented. In exchangeable models with many observations such as Bayesian Generalized Linear Models, child-node contributions to the conditional posterior of each node can be calculated concurrently. In undirected graphs with discrete nodes, concurrent sampling of conditionally-independent nodes can be transformed into a SIMD form. High-performance libraries with multi-threading and vectorization capabilities can be readily applied to such SIMD opportunities to gain decent speedup, while a series of high-level source-code and runtime modifications provide further performance boost by reducing parallelization overhead and increasing data locality for NUMA architectures. For big-data Bayesian GLM graphs, the end-result is a routine for evaluating the conditional posterior and its gradient vector that is 5 times faster than a naive implementation using (built-in) multi-threaded Intel MKL BLAS, and reaches within the striking distance of the memory-bandwidth-induced hardware limit. The proposed optimization strategies improve the scaling of performance with number of cores and width of vector units (applicable to many-core SIMD processors such as Intel Xeon Phi and GPUs), resulting in cost-effectiveness, energy efficiency, and higher speed on multi-core x86 processors.

data mining, machine learning, node, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.csda.2015.02.010

1310.1537

Country: North America > United States (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Katz, Itamar, Crammer, Koby

Outlier-Robust Convex Segmentation

arXiv.org Machine LearningNov-18-2014

We derive a convex optimization problem for the task of segmenting sequential data, which explicitly treats presence of outliers. We describe two algorithms for solving this problem, one exact and one a top-down novel approach, and we derive a consistency results for the case of two segments and no outliers. Robustness to outliers is evaluated on two real-world tasks related to speech segmentation. Our algorithms outperform baseline segmentation algorithms.

algorithm, artificial intelligence, machine learning, (18 more...)

1411.4503

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)

Li, Xingguo, Haupt, Jarvis

Identifying Outliers in Large Matrices via Randomized Adaptive Compressive Sampling

arXiv.org Machine LearningNov-18-2014

This paper examines the problem of locating outlier columns in a large, otherwise low-rank, matrix. We propose a simple two-step adaptive sensing and inference approach and establish theoretical guarantees for its performance; our results show that accurate outlier identification is achievable using very few linear summaries of the original data matrix -- as few as the squared rank of the low-rank component plus the number of outliers, times constant and logarithmic factors. We demonstrate the performance of our approach experimentally in two stylized applications, one motivated by robust collaborative filtering tasks, and the other by saliency map estimation tasks arising in computer vision and automated surveillance, and also investigate extensions to settings where the data are noisy, or possibly incomplete.

artificial intelligence, data mining, machine learning, (20 more...)

doi: 10.1109/TSP.2015.2401536

1407.0312

Country: North America > United States > Texas (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)