Goto

Collaborating Authors

 University of Texas at Arlington


Asynchronous Doubly Stochastic Sparse Kernel Learning

AAAI Conferences

Kernel methods have achieved tremendous success in the past two decades. In the current big data era, data collection has grown tremendously. However, existing kernel methods are not scalable enough both at the training and predicting steps. To address this challenge, in this paper, we first introduce a general sparse kernel learning formulation based on the random feature approximation, where the loss functions are possibly non-convex. Then we propose a new asynchronous parallel doubly stochastic algorithm for large scale sparse kernel learning (AsyDSSKL). To the best our knowledge, AsyDSSKL is the first algorithm with the techniques of asynchronous parallel computation and doubly stochastic optimization. We also provide a comprehensive convergence guarantee to AsyDSSKL. Importantly, the experimental results on various large-scale real-world datasets show that, our AsyDSSKL method has the significant superiority on the computational efficiency at the training and predicting steps over the existing kernel methods.


Inexact Proximal Gradient Methods for Non-Convex and Non-Smooth Optimization

AAAI Conferences

In machine learning research, the proximal gradient methods are popular for solving various optimization problems with non-smooth regularization. Inexact proximal gradient methods are extremely important when exactly solving the proximal operator is time-consuming, or the proximal operator does not have an analytic solution. However, existing inexact proximal gradient methods only consider convex problems. The knowledge of inexact proximal gradient methods in the non-convex setting is very limited. To address this challenge, in this paper, we first propose three inexact proximal gradient algorithms, including the basic version and Nesterovโ€™s accelerated version. After that, we provide the theoretical analysis to the basic and Nesterovโ€™s accelerated versions. The theoretical results show that our inexact proximal gradient algorithms can have the same convergence rates as the ones of exact proximal gradient algorithms in the non-convex setting. Finally, we show the applications of our inexact proximal gradient algorithms on three representative non-convex learning problems. Empirical results confirm the superiority of our new inexact proximal gradient algorithms.


Exercise-Enhanced Sequential Modeling for Student Performance Prediction

AAAI Conferences

In online education systems, for offering proactive services to students (e.g., personalized exercise recommendation), a crucial demand is to predict student performance (e.g., scores) on future exercising activities. Existing prediction methods mainly exploit the historical exercising records of students, where each exercise is usually represented as the manually labeled knowledge concepts, and the richer information contained in the text description of exercises is still underexplored. In this paper, we propose a novel Exercise-Enhanced Recurrent Neural Network (EERNN) framework for student performance prediction by taking full advantage of both student exercising records and the text of each exercise. Specifically, for modeling the student exercising process, we first design a bidirectional LSTM to learn each exercise representation from its text description without any expertise and information loss. Then, we propose a new LSTM architecture to trace student states (i.e., knowledge states) in their sequential exercising process with the combination of exercise representations. For making final predictions, we design two strategies under EERNN, i.e., EERNNM with Markov property and EERNNA with Attention mechanism. Extensive experiments on large-scale real-world data clearly demonstrate the effectiveness of EERNN framework. Moreover, by incorporating the exercise correlations, EERNN can well deal with the cold start problems from both student and exercise perspectives.


A Supervised Sparse Learning Framework to Solve EEG Inverse Problem for Discriminative Activations Pattern

AAAI Conferences

Electroencephalography (EEG) is one of the most important noninvasive neuroimaging tools that provides excellent temporal accuracy. As the EEG electrode sensors measure electrical potentials on the scalp instead of direct measuring activities of brain voxels deep inside the head, many approaches are proposed to infer the activated brain regions due to its significance in neuroscience research and clinical application. However, since mostly part of the brain activity is composed of the spontaneous neural activities or non-task related activations, task related activation patterns will be corrupted in strong background signal/noises. In our research, we proposed a sparse learning framework for solving EEG inverse problem which aims to explicitly extract the discriminative sources for different cognitive tasks by fusing the label information into the inverse model. The proposed framework is capable of estimation the discriminative brain sources under given different brain states where traditional inverse methods failed. We introduced two models, one is formulated as supervised sparse dictionary learning and the other one is the graph regularized discriminative source estimation model to promote the consistency within same class. Preliminary experimental results also validated that the proposed sparse learning framework is effective to discover the discriminative task-related brain activation sources, which shows the potential to advance the high resolution EEG source analysis for real-time non-invasive brain imaging research.


Nonnegative Orthogonal Graph Matching

AAAI Conferences

Graph matching problem that incorporates pair-wise constraints can be formulated as Quadratic Assignment Problem(QAP). The optimal solution of QAP is discrete and combinational, which makes QAP problem NP-hard. Thus, many algorithms have been proposed to find approximate solutions. In this paper, we propose a new algorithm, called Nonnegative Orthogonal Graph Matching (NOGM), for QAP matching problem. NOGM is motivated by our new observation that the discrete mapping constraint of QAP can be equivalently encoded by a nonnegative orthogonal constraint which is much easier to implement computationally. Based on this observation, we develop an effective multiplicative update algorithm to solve NOGM and thus can find an effective approximate solution for QAP problem. Comparing with many traditional continuous methods which usually obtain continuous solutions and should be further discretized, NOGM can obtain a sparse solution and thus incorporates the desirable discrete constraint naturally in its optimization. Promising experimental results demonstrate benefits of NOGM algorithm.


Rank Ordering Constraints Elimination with Application for Kernel Learning

AAAI Conferences

A number of machine learning domains,such as information retrieval, recommender systems, kernel learning, neural network-biological systems etc,deal with importance scores. Very often, there existsome prior knowledge that could help improve the performance.In many cases, these prior knowledge manifest themselves in the rank ordering constraints.These inequality constraints are usually very difficult to deal with in optimization.In this paper, we provide a slack variable transformation methods, which effectively eliminatesthe rank ordering inequality constraints, and thus simplify the learning task significantly.We apply this transformation in kernel learning problem, and also provide an efficient algorithm tosolved the transformed system. On seven datasets,our approach reduces the computational time by orders of magnitudes as compared to the current standardquadratically constrained quadratic programming(QCQP) optimization approach.


Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization

AAAI Conferences

We provide the first theoretical analysis on the convergence rate of asynchronous mini-batch gradient descent with variance reduction (AsySVRG) for non-convex optimization. Asynchronous stochastic gradient descent (AsySGD) has been broadly used for deep learning optimization, and it is proved to converge with rate of O(1/\sqrt{T}) for non-convex optimization. Recently, variance reduction technique is proposed and it is proved to be able to accelerate the convergence of SGD greatly. It is shown that asynchronous SGD method with variance reduction technique has linear convergence rate when problem is strongly convex. However, there is still no work to analyze the convergence rate of this method for non-convex problem. In this paper, we consider two asynchronous parallel implementations of mini-batch gradient descent method with variance reduction: one is on distributed-memory architecture and the other is on shared-memory architecture. We prove that both methods can converge with a rate of O(1/T) for non-convex optimization, and linear speedup is accessible when we increase the number of workers. We evaluate our methods by optimizing multi-layer neural networks on two real datasets (MNIST and CIFAR-10), and experimental results demonstrate our theoretical analysis.


A Sparse Dictionary Learning Framework to Discover Discriminative Source Activations in EEG Brain Mapping

AAAI Conferences

Electroencephalography (EEG) source analysis is one of the most important noninvasive human brain imaging tools that provides millisecond temporal accuracy. However, discovering essential activated brain sources associated with different brain status is still a challenging problem. In this study, we propose for the first time that the ill-posed EEG inverse problem can be formulated and solved as a sparse over-complete dictionary learning problem. In particular, a novel supervised sparse dictionary learning framework was developed for EEG source reconstruction. A revised version of discriminative K-SVD (DK-SVD) algorithm is exploited to solve the formulated supervised dictionary learning problem. As the proposed learning framework incorporated the EEG label information of different brain status, it is capable of learning a sparse representation that reveal the most discriminative brain activity sources among different brain states. Compared to the state-of-the-art EEG source analysis methods, proposed sparse dictionary learning framework achieved significant superior performance in both computing speed and accuracy for the challenging EEG source reconstruction problem through extensive numerical experiments. More importantly, the experimental results also validated that the proposed sparse learning framework is effective to discover the discriminative task-related brain activation sources, which shows the potential to advance the high resolution EEG source analysis for real-time non-invasive brain imaging research.


Semi-Supervised Classifications via Elastic and Robust Embedding

AAAI Conferences

Transductive semi-supervised learning can only predict labels for unlabeled data appearing in training data, and can not predict labels for testing data never appearing in training set. To handle this out-of-sample problem, many inductive methods make a constraint such that the predicted label matrix should be exactly equal to a linear model. In practice, this constraint might be too rigid to capture the manifold structure of data. In this paper, we relax this rigid constraint and propose to use an elastic constraint on the predicted label matrix such that the manifold structure can be better explored. Moreover, since unlabeled data are often very abundant in practice and usually there are some outliers, we use a non-squared loss instead of the traditional squared loss to learn a robust model. The derived problem, although is convex, has so many nonsmooth terms, which make it very challenging to solve. In the paper, we propose an efficient optimization algorithm to solve a more general problem, based on which we find the optimal solution to the derived problem.


Local Centroids Structured Non-Negative Matrix Factorization

AAAI Conferences

Non-negative Matrix Factorization (NMF) has attracted much attention and been widely used in real-world applications. As a clustering method, it fails to handle the case where data points lie in a complicated geometry structure. Existing methods adopt single global centroid for each cluster, failing to capture the manifold structure. In this paper, we propose a novel local centroids structured NMF to address this drawback. Instead of using single centroid for each cluster, we introduce multiple local centroids for individual cluster such that the manifold structure can be captured by the local centroids. Such a novel NMF method can improve the clustering performance effectively. Furthermore, a novel bipartite graph is incorporated to obtain the clustering indicator directly without any post process. Experiments on both toy datasets and real-world datasets have verified the effectiveness of the proposed method.