Asia
Kernel-based Information Criterion
Danafar, Somayeh, Fukumizu, Kenji, Gomez, Faustino
This paper introduces Kernel-based Information Criterion (KIC) for model selection in regression analysis. The novel kernel-based complexity measure in KIC efficiently computes the interdependency between parameters of the model using a variable-wise variance and yields selection of better, more robust regressors. Experimental results show superior performance on both simulated and real data sets compared to Leave-One-Out Cross-Validation (LOOCV), kernel-based Information Complexity (ICOMP), and maximum log of marginal likelihood in Gaussian Process Regression (GPR).
Smoothed Low Rank and Sparse Matrix Recovery by Iteratively Reweighted Least Squares Minimization
Lu, Canyi, Lin, Zhouchen, Yan, Shuicheng
This work presents a general framework for solving the low rank and/or sparse matrix minimization problems, which may involve multiple non-smooth terms. The Iteratively Reweighted Least Squares (IRLS) method is a fast solver, which smooths the objective function and minimizes it by alternately updating the variables and their weights. However, the traditional IRLS can only solve a sparse only or low rank only minimization problem with squared loss or an affine constraint. This work generalizes IRLS to solve joint/mixed low rank and sparse minimization problems, which are essential formulations for many tasks. As a concrete example, we solve the Schatten-$p$ norm and $\ell_{2,q}$-norm regularized Low-Rank Representation (LRR) problem by IRLS, and theoretically prove that the derived solution is a stationary point (globally optimal if $p,q\geq1$). Our convergence proof of IRLS is more general than previous one which depends on the special properties of the Schatten-$p$ norm and $\ell_{2,q}$-norm. Extensive experiments on both synthetic and real data sets demonstrate that our IRLS is much more efficient.
Multi-Target Shrinkage
Bartz, Daniel, Hรถhne, Johannes, Mรผller, Klaus-Robert
Stein showed that the multivariate sample mean is outperformed by "shrinking" to a constant target vector. Ledoit and Wolf extended this approach to the sample covariance matrix and proposed a multiple of the identity as shrinkage target. In a general framework, independent of a specific estimator, we extend the shrinkage concept by allowing simultaneous shrinkage to a set of targets. Application scenarios include settings with (A) additional data sets from potentially similar distributions, (B) non-stationarity, (C) a natural grouping of the data or (D) multiple alternative estimators which could serve as targets. We show that this Multi-Target Shrinkage can be translated into a quadratic program and derive conditions under which the estimation of the shrinkage intensities yields optimal expected squared error in the limit. For the sample mean and the sample covariance as specific instances, we derive conditions under which the optimality of MTS is applicable. We consider two asymptotic settings: the large dimensional limit (LDL), where the dimensionality and the number of observations go to infinity at the same rate, and the finite observations large dimensional limit (FOLDL), where only the dimensionality goes to infinity while the number of observations remains constant. We then show the effectiveness in extensive simulations and on real world data.
Detection of cheating by decimation algorithm
Yamanaka, Shogo, Ohzeki, Masayuki, Decelle, Aurelien
We expand the item response theory to study the case of "cheating students" for a set of exams, trying to detect them by applying a greedy algorithm of inference. This extended model is closely related to the Boltzmann machine learning. In this paper we aim to infer the correct biases and interactions of our model by considering a relatively small number of sets of training data. Nevertheless, the greedy algorithm that we employed in the present study exhibits good performance with a few number of training data. The key point is the sparseness of the interactions in our problem in the context of the Boltzmann machine learning: the existence of cheating students is expected to be very rare (possibly even in real world). We compare a standard approach to infer the sparse interactions in the Boltzmann machine learning to our greedy algorithm and we find the latter to be superior in several aspects.
Fuzzy human motion analysis: A review
Lim, Chern Hong, Vats, Ekta, Chan, Chee Seng
Human Motion Analysis (HMA) is currently one of the most popularly active research domains as such significant research interests are motivated by a number of real world applications such as video surveillance, sports analysis, healthcare monitoring and so on. However, most of these real world applications face high levels of uncertainties that can affect the operations of such applications. Hence, the fuzzy set theory has been applied and showed great success in the recent past. In this paper, we aim at reviewing the fuzzy set oriented approaches for HMA, individuating how the fuzzy set may improve the HMA, envisaging and delineating the future perspectives. To the best of our knowledge, there is not found a single survey in the current literature that has discussed and reviewed fuzzy approaches towards the HMA. For ease of understanding, we conceptually classify the human motion into three broad levels: Low-Level (LoL), Mid-Level (MiL), and High-Level (HiL) HMA.
Worst-Case Linear Discriminant Analysis as Scalable Semidefinite Feasibility Problems
Li, Hui, Shen, Chunhua, Hengel, Anton van den, Shi, Qinfeng
In this paper, we propose an efficient semidefinite programming (SDP) approach to worst-case linear discriminant analysis (WLDA). Compared with the traditional LDA, WLDA considers the dimensionality reduction problem from the worst-case viewpoint, which is in general more robust for classification. However, the original problem of WLDA is non-convex and difficult to optimize. In this paper, we reformulate the optimization problem of WLDA into a sequence of semidefinite feasibility problems. To efficiently solve the semidefinite feasibility problems, we design a new scalable optimization method with quasi-Newton methods and eigen-decomposition being the core components. The proposed method is orders of magnitude faster than standard interior-point based SDP solvers. Experiments on a variety of classification problems demonstrate that our approach achieves better performance than standard LDA. Our method is also much faster and more scalable than standard interior-point SDP solvers based WLDA. The computational complexity for an SDP with $m$ constraints and matrices of size $d$ by $d$ is roughly reduced from $\mathcal{O}(m^3+md^3+m^2d^2)$ to $\mathcal{O}(d^3)$ ($m>d$ in our case).
Noise Benefits in Expectation-Maximization Algorithms
This dissertation shows that careful injection of noise into sample data can substantially speed up Expectation-Maximization algorithms. Expectation-Maximization algorithms are a class of iterative algorithms for extracting maximum likelihood estimates from corrupted or incomplete data. The convergence speed-up is an example of a noise benefit or "stochastic resonance" in statistical signal processing. The dissertation presents derivations of sufficient conditions for such noise-benefits and demonstrates the speed-up in some ubiquitous signal-processing algorithms. These algorithms include parameter estimation for mixture models, the $k$-means clustering algorithm, the Baum-Welch algorithm for training hidden Markov models, and backpropagation for training feedforward artificial neural networks. This dissertation also analyses the effects of data and model corruption on the more general Bayesian inference estimation framework. The main finding is a theorem guaranteeing that uniform approximators for Bayesian model functions produce uniform approximators for the posterior pdf via Bayes theorem. This result also applies to hierarchical and multidimensional Bayesian models.
Iterative Plan Construction for the Workflow Satisfiability Problem
Cohen, D., Crampton, J., Gagarin, A., Gutin, G., Jones, M.
The Workflow Satisfiability Problem (WSP) is a problem of practical interest that arises whenever tasks need to be performed by authorized users, subject to constraints defined by business rules. We are required to decide whether there exists a plan - an assignment of tasks to authorized users - such that all constraints are satisfied. It is natural to see the WSP as a subclass of the Constraint Satisfaction Problem (CSP) in which the variables are tasks and the domain is the set of users. What makes the WSP distinctive is that the number of tasks is usually very small compared to the number of users, so it is appropriate to ask for which constraint languages the WSP is fixed-parameter tractable (FPT), parameterized by the number of tasks. This novel approach to the WSP, using techniques from CSP, has enabled us to design a generic algorithm which is FPT for several families of workflow constraints considered in the literature. Furthermore, we prove that the union of FPT languages remains FPT if they satisfy a simple compatibility condition. Lastly, we identify a new FPT constraint language, user-independent constraints, that includes many of the constraints of interest in business processing systems. We demonstrate that our generic algorithm has provably optimal running time O*(2^(klog k)), for this language, where k is the number of tasks.
$l_1$-regularized Outlier Isolation and Regression
Han, Sheng, Wang, Suzhen, Wu, Xinyu
This paper proposed a new regression model called $l_1$-regularized outlier isolation and regression (LOIRE) and a fast algorithm based on block coordinate descent to solve this model. Besides, assuming outliers are gross errors following a Bernoulli process, this paper also presented a Bernoulli estimate model which, in theory, should be very accurate and robust due to its complete elimination of affections caused by outliers. Though this Bernoulli estimate is hard to solve, it could be approximately achieved through a process which takes LOIRE as an important intermediate step. As a result, the approximate Bernoulli estimate is a good combination of Bernoulli estimate's accuracy and LOIRE regression's efficiency with several simulations conducted to strongly verify this point. Moreover, LOIRE can be further extended to realize robust rank factorization which is powerful in recovering low-rank component from massive corruptions. Extensive experimental results showed that the proposed method outperforms state-of-the-art methods like RPCA and GoDec in the aspect of computation speed with a competitive performance.
Large-Margin Classification with Multiple Decision Rules
Kimes, Patrick K., Hayes, D. Neil, Marron, J. S., Liu, Yufeng
Binary classification is a common statistical learning problem in which a model is estimated on a set of covariates for some outcome indicating the membership of one of two classes. In the literature, there exists a distinction between hard and soft classification. In soft classification, the conditional class probability is modeled as a function of the covariates. In contrast, hard classification methods only target the optimal prediction boundary. While hard and soft classification methods have been studied extensively, not much work has been done to compare the actual tasks of hard and soft classification. In this paper we propose a spectrum of statistical learning problems which span the hard and soft classification tasks based on fitting multiple decision rules to the data. By doing so, we reveal a novel collection of learning tasks of increasing complexity. We study the problems using the framework of large-margin classifiers and a class of piecewise linear convex surrogates, for which we derive statistical properties and a corresponding sub-gradient descent algorithm. We conclude by applying our approach to simulation settings and a magnetic resonance imaging (MRI) dataset from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study.