Statistical Learning
A Framework for Optimizing Paper Matching
Charlin, Laurent, Zemel, Richard S., Boutilier, Craig
At the heart of many scientific conferences is the problem of matching submitted papers to suitable reviewers. Arriving at a good assignment is a major and important challenge for any conference organizer. In this paper we propose a framework to optimize paper-to-reviewer assignments. Our framework uses suitability scores to measure pairwise affinity between papers and reviewers. We show how learning can be used to infer suitability scores from a small set of provided scores, thereby reducing the burden on reviewers and organizers. We frame the assignment problem as an integer program and propose several variations for the paper-to-reviewer matching domain. We also explore how learning and matching interact. Experiments on two conference data sets examine the performance of several learning methods as well as the effectiveness of the matching formulations.
Regularized Tensor Factorizations and Higher-Order Principal Components Analysis
High-dimensional tensors or multi-way data are becoming prevalent in areas such as biomedical imaging, chemometrics, networking and bibliometrics. Traditional approaches to finding lower dimensional representations of tensor data include flattening the data and applying matrix factorizations such as principal components analysis (PCA) or employing tensor decompositions such as the CANDECOMP / PARAFAC (CP) and Tucker decompositions. The former can lose important structure in the data, while the latter Higher-Order PCA (HOPCA) methods can be problematic in high-dimensions with many irrelevant features. We introduce frameworks for sparse tensor factorizations or Sparse HOPCA based on heuristic algorithmic approaches and by solving penalized optimization problems related to the CP decomposition. Extensions of these approaches lead to methods for general regularized tensor factorizations, multi-way Functional HOPCA and generalizations of HOPCA for structured data. We illustrate the utility of our methods for dimension reduction, feature selection, and signal recovery on simulated data and multi-dimensional microarrays and functional MRIs.
Unfair items detection in educational measurement
Measurement professionals cannot come to an agreement on the definition of the term 'item fairness'. In this paper a continuous measure of item unfairness is proposed. The more the unfairness measure deviates from zero, the less fair the item is. If the measure exceeds the cutoff value, the item is identified as definitely unfair. The new approach can identify unfair items that would not be identified with conventional procedures. The results are in accord with experts' judgments on the item qualities. Since no assumptions about scores distributions and/or correlations are assumed, the method is applicable to any educational test. Its performance is illustrated through application to scores of a real test.
A General Theory of Concave Regularization for High Dimensional Sparse Estimation Problems
Concave regularization methods provide natural procedures for sparse recovery. However, they are difficult to analyze in the high dimensional setting. Only recently a few sparse recovery results have been established for some specific local solutions obtained via specialized numerical procedures. Still, the fundamental relationship between these solutions such as whether they are identical or their relationship to the global minimizer of the underlying nonconvex formulation is unknown. The current paper fills this conceptual gap by presenting a general theoretical framework showing that under appropriate conditions, the global solution of nonconvex regularization leads to desirable recovery performance; moreover, under suitable conditions, the global solution corresponds to the unique sparse local solution, which can be obtained via different numerical procedures. Under this unified framework, we present an overview of existing results and discuss their connections. The unified view of this work leads to a more satisfactory treatment of concave high dimensional sparse estimation procedures, and serves as guideline for developing further numerical procedures for concave regularization.
A framework: Cluster detection and multidimensional visualization of automated data mining using intelligent agents
Jayabrabu, R., Saravanan, V., Vivekanandan, K.
Data Mining techniques plays a vital role like extraction of required knowledge, finding unsuspected information to make strategic decision in a novel way which in term understandable by domain experts. A generalized frame work is proposed by considering non - domain experts during mining process for better understanding, making better decision and better finding new patters in case of selecting suitable data mining techniques based on the user profile by means of intelligent agents.
Classification of artificial intelligence ids for smurf attack
Ugtakhbayar, N., Battulga, D., Sodbileg, Sh.
Many methods have been developed to secure the network infrastructure and communication over the Internet. Intrusion detection is a relatively new addition to such techniques. Intrusion detection systems (IDS) are used to find out if someone has intrusion into or is trying to get it the network. One big problem is amount of Intrusion which is increasing day by day. We need to know about network attack information using IDS, then analysing the effect. Due to the nature of IDSs which are solely signature based, every new intrusion cannot be detected; so it is important to introduce artificial intelligence (AI) methods / techniques in IDS. Introduction of AI necessitates the importance of normalization in intrusions. This work is focused on classification of AI based IDS techniques which will help better design intrusion detection systems in the future. We have also proposed a support vector machine for IDS to detect Smurf attack with much reliable accuracy.
Algebraic Geometric Comparison of Probability Distributions
Kiraly, Franz J., von Buenau, Paul, Meinecke, Frank C., Blythe, Duncan A. J., Mueller, Klaus-Robert
We propose a novel algebraic algorithmic framework for dealing with probability distributions represented by their cumulants such as the mean and covariance matrix. As an example, we consider the unsupervised learning problem of finding the subspace on which several probability distributions agree. Instead of minimizing an objective function involving the estimated cumulants, we show that by treating the cumulants as elements of the polynomial ring we can directly solve the problem, at a lower computational cost and with higher accuracy. Moreover, the algebraic viewpoint on probability distributions allows us to invoke the theory of algebraic geometry, which we demonstrate in a compact proof for an identifiability criterion.
Minimax Rates of Estimation for Sparse PCA in High Dimensions
We study sparse principal components analysis in the high-dimensional setting, where $p$ (the number of variables) can be much larger than $n$ (the number of observations). We prove optimal, non-asymptotic lower and upper bounds on the minimax estimation error for the leading eigenvector when it belongs to an $\ell_q$ ball for $q \in [0,1]$. Our bounds are sharp in $p$ and $n$ for all $q \in [0, 1]$ over a wide class of distributions. The upper bound is obtained by analyzing the performance of $\ell_q$-constrained PCA. In particular, our results provide convergence rates for $\ell_1$-constrained PCA.
Beta-Negative Binomial Process and Poisson Factor Analysis
Zhou, Mingyuan, Hannah, Lauren, Dunson, David, Carin, Lawrence
A beta-negative binomial (BNB) process is proposed, leading to a beta-gamma-Poisson process, which may be viewed as a "multi-scoop" generalization of the beta-Bernoulli process. The BNB process is augmented into a beta-gamma-gamma-Poisson hierarchical structure, and applied as a nonparametric Bayesian prior for an infinite Poisson factor analysis model. A finite approximation for the beta process Levy random measure is constructed for convenient implementation. Efficient MCMC computations are performed with data augmentation and marginalization techniques. Encouraging results are shown on document count matrix factorization.
A Reconstruction Error Formulation for Semi-Supervised Multi-task and Multi-view Learning
Qian, Buyue, Wang, Xiang, Davidson, Ian
A significant challenge to make learning techniques more sui table for general purpose use is to move beyond i) complete supervision, ii) low dimensional data, iii) a single t ask and single view per instance. Solving these challenges a llows working with "Big Data" problems that are typically high dim ensional with multiple (but possibly incomplete) labeling s and views. While other work has addressed each of these probl ems separately, in this paper we show how to address them together, namelysemi-supervised dimension reduction for multi-task and multi-view learning (SSDR-MML), which performs optimization for dimension reduction and label inference in semi-supervised setting. The proposed framework is designed to handle both multi-task and multi-view learning settings, and can be easily adapted to many useful applications. Inform ation obtained from all tasks and views is combined via reconstruction errors in a linear fashion that can be efficiently solvedusing an alternating optimization scheme. Our formulation has a number of advantages. W e explicitly model the information combining mechanism as a data structure (a weight/nearest-nei ghbor matrix) which allows investigating fundamental ques tions in multi-task and multi-view learning. W e address one such question by presenting a general measure to quantify the success of simultaneous learning of multiple tasks or from multiple views. W e show that our SSDR-MML approach can outperform many state-of-the-art baseline methods and demonstrate the effectiveness of connecting dimension reduction and learning.