Not enough data to create a plot.
Try a different view from the menu above.
Chen, Li
Semi-supervised classification for dynamic Android malware detection
Chen, Li, Zhang, Mingwei, Yang, Chih-Yuan, Sahita, Ravi
A growing number of threats to Android phones creates challenges for malware detection. Manually labeling the samples into benign or different malicious families requires tremendous human efforts, while it is comparably easy and cheap to obtain a large amount of unlabeled APKs from various sources. Moreover, the fast-paced evolution of Android malware continuously generates derivative malware families. These families often contain new signatures, which can escape detection when using static analysis. These practical challenges can also cause traditional supervised machine learning algorithms to degrade in performance. In this paper, we propose a framework that uses model-based semi-supervised (MBSS) classification scheme on the dynamic Android API call logs. The semi-supervised approach efficiently uses the labeled and unlabeled APKs to estimate a finite mixture model of Gaussian distributions via conditional expectation-maximization and efficiently detects malwares during out-of-sample testing. We compare MBSS with the popular malware detection classifiers such as support vector machine (SVM), $k$-nearest neighbor (kNN) and linear discriminant analysis (LDA). Under the ideal classification setting, MBSS has competitive performance with 98\% accuracy and very low false positive rate for in-sample classification. For out-of-sample testing, the out-of-sample test data exhibit similar behavior of retrieving phone information and sending to the network, compared with in-sample training set. When this similarity is strong, MBSS and SVM with linear kernel maintain 90\% detection rate while $k$NN and LDA suffer great performance degradation. When this similarity is slightly weaker, all classifiers degrade in performance, but MBSS still performs significantly better than other classifiers.
Sparse Algorithm for Robust LSSVM in Primal Space
Chen, Li, Zhou, Shuisheng
As enjoying the closed form solution, least squares support vector machine (LSSVM) has been widely used for classification and regression problems having the comparable performance with other types of SVMs. However, LSSVM has two drawbacks: sensitive to outliers and lacking sparseness. Robust LSSVM (R-LSSVM) overcomes the first partly via nonconvex truncated loss function, but the current algorithms for R-LSSVM with the dense solution are faced with the second drawback and are inefficient for training large-scale problems. In this paper, we interpret the robustness of R-LSSVM from a re-weighted viewpoint and give a primal R-LSSVM by the representer theorem. The new model may have sparse solution if the corresponding kernel matrix has low rank. Then approximating the kernel matrix by a low-rank matrix and smoothing the loss function by entropy penalty function, we propose a convergent sparse R-LSSVM (SR-LSSVM) algorithm to achieve the sparse solution of primal R-LSSVM, which overcomes two drawbacks of LSSVM simultaneously. The proposed algorithm has lower complexity than the existing algorithms and is very efficient for training large-scale problems. Many experimental results illustrate that SR-LSSVM can achieve better or comparable performance with less training time than related algorithms, especially for training large scale problems.
Robust Vertex Classification
Chen, Li, Shen, Cencheng, Vogelstein, Joshua, Priebe, Carey
For random graphs distributed according to stochastic blockmodels, a special case of latent position graphs, adjacency spectral embedding followed by appropriate vertex classification is asymptotically Bayes optimal; but this approach requires knowledge of and critically depends on the model dimension. In this paper, we propose a sparse representation vertex classifier which does not require information about the model dimension. This classifier represents a test vertex as a sparse combination of the vertices in the training set and uses the recovered coefficients to classify the test vertex. We prove consistency of our proposed classifier for stochastic blockmodels, and demonstrate that the sparse representation classifier can predict vertex labels with higher accuracy than adjacency spectral embedding approaches via both simulation studies and real data experiments. Our results demonstrate the robustness and effectiveness of our proposed vertex classifier when the model dimension is unknown.
Spectral Clustering for Divide-and-Conquer Graph Matching
Lyzinski, Vince, Sussman, Daniel L., Fishkind, Donniell E., Pao, Henry, Chen, Li, Vogelstein, Joshua T., Park, Youngser, Priebe, Carey E.
We present a parallelized bijective graph matching algorithm that leverages seeds and is designed to match very large graphs. Our algorithm combines spectral graph embedding with existing state-of-the-art seeded graph matching procedures. We justify our approach by proving that modestly correlated, large stochastic block model random graphs are correctly matched utilizing very few seeds through our divide-and-conquer procedure. We also demonstrate the effectiveness of our approach in matching very large graphs in simulated and real data examples, showing up to a factor of 8 improvement in runtime with minimal sacrifice in accuracy.
Stochastic Blockmodeling for Online Advertising
Chen, Li (Johns Hopkins University) | Patton, Matthew (AOL Advertising.com)
Online advertising is an important and huge industry. Having knowledge of the website attributes can contribute greatly to business strategies for ad-targeting, content display, inventory purchase or revenue prediction. In this paper, we introduce a stochastic blockmodeling for the website relations induced by the event of online user visitation. We propose two clustering algorithms to discover the intrinsic structures of websites, and compare the performance with a goodness-of-fit method and a deterministic graph partitioning method. We demonstrate the effectiveness of our algorithms on both simulation and AOL website dataset.
Stochastic Blockmodeling for Online Advertising
Chen, Li, Patton, Matthew
Online advertising is an important and huge industry. Having knowledge of the website attributes can contribute greatly to business strategies for ad-targeting, content display, inventory purchase or revenue prediction. Classical inferences on users and sites impose challenge, because the data is voluminous, sparse, high-dimensional and noisy. In this paper, we introduce a stochastic blockmodeling for the website relations induced by the event of online user visitation. We propose two clustering algorithms to discover the instrinsic structures of websites, and compare the performance with a goodness-of-fit method and a deterministic graph partitioning method. We demonstrate the effectiveness of our algorithms on both simulation and AOL website dataset.
Unsupervised deconvolution of dynamic imaging reveals intratumor vascular heterogeneity
Chen, Li, Choyke, Peter L., Wang, Niya, Clarke, Robert, Bhujwalla, Zaver M., Hillman, Elizabeth M. C., Wang, Yue
Intratumor heterogeneity is often manifested by vascular compartments with distinct pharmacokinetics that cannot be resolved directly by in vivo dynamic imaging. We developed tissue-specific compartment modeling (TSCM), an unsupervised computational method of deconvolving dynamic imaging series from heterogeneous tumors that can improve vascular phenotyping in many biological contexts. Applying TSCM to dynamic contrast-enhanced MRI of breast cancers revealed characteristic intratumor vascular heterogeneity and therapeutic responses that were otherwise undetectable.
User-Involved Preference Elicitation for Product Search and Recommender Systems
Pu, Pearl (Ecole Polytechnique Fédérale de Lausanne (EPFL)) | Chen, Li (Ecole Polytechnique Fédérale de Lausanne (EPFL))
We address user system interaction issues in product search and recommender systems: how to help users select the most preferential item from a large collection of alternatives. As such systems must crucially rely on an accurate and complete model of user preferences, the acquisition of this model becomes the central subject of our paper. Many tools used today do not satisfactorily assist users to establish this model because they do not adequately focus on fundamental decision objectives, help them reveal hidden preferences, revise conflicting preferences, or explicitly reason about tradeoffs. In this article, we provide some analyses of common areas of design pitfalls and derive a set of design guidelines that assist the user in avoiding these problems in three important areas: user preference elicitation, preference revision, and explanation interfaces.
User-Involved Preference Elicitation for Product Search and Recommender Systems
Pu, Pearl (Ecole Polytechnique Fédérale de Lausanne (EPFL)) | Chen, Li (Ecole Polytechnique Fédérale de Lausanne (EPFL))
We address user system interaction issues in product search and recommender systems: how to help users select the most preferential item from a large collection of alternatives. As such systems must crucially rely on an accurate and complete model of user preferences, the acquisition of this model becomes the central subject of our paper. Many tools used today do not satisfactorily assist users to establish this model because they do not adequately focus on fundamental decision objectives, help them reveal hidden preferences, revise conflicting preferences, or explicitly reason about tradeoffs. As a result, users fail to find the outcomes that best satisfy their needs and preferences. In this article, we provide some analyses of common areas of design pitfalls and derive a set of design guidelines that assist the user in avoiding these problems in three important areas: user preference elicitation, preference revision, and explanation interfaces. For each area, we describe the state-of-the-art of the developed techniques and discuss concrete scenarios where they have been applied and tested.