Xu, Jinhui
Differentially Private Empirical Risk Minimization Revisited: Faster and More General
Wang, Di, Ye, Minwei, Xu, Jinhui
In this paper we study the differentially private Empirical Risk Minimization (ERM) problem in different settings. For smooth (strongly) convex loss function with or without (non)-smooth regularization, we give algorithms that achieve either optimal or near optimal utility bounds with less gradient complexity compared with previous work. For ERM with smooth convex loss function in high-dimensional (p n) setting, we give an algorithm which achieves the upper bound with less gradient complexity than previous ones. At last, we generalize the expected excess empirical risk from convex loss functions to non-convex ones satisfying the Polyak-Lojasiewicz condition and give a tighter upper bound on the utility than the one in [34].
Novel Geometric Approach for Global Alignment of PPI Networks
Liu, Yangwei (State University of New York at Buffalo) | Ding, Hu (Michigan State University) | Chen, Danyang (State University of New York at Buffalo) | Xu, Jinhui (State University of New York at Buffalo)
In this paper we present a novel geometric method for the problem of global pairwise alignment of protein-protein interaction (PPI) networks. A PPI network can be viewed as a node-edge graph and its alignment often needs to solve some generalized version of the subgraph isomorphism problem which is notoriously challenging and NP-hard. All existing research has focused on designing algorithms with good practical performance. In this paper we propose a two-step algorithm for the global pairwise PPI network alignment which consists of a Geometric Step and an MCMF Step. Our algorithm first applies a graph embedding technique that preserves the topological structure of the original PPI networks and maps the problem from graph domain to geometric domain, and computes a rigid transformation for one of the embedded PPI networks so as to minimize its Earth Mover's Distance (EMD) to the other PPI network. It then solves a Min-Cost Max-Flow problem using the (scaled) inverse of sequence similarity scores as edge weight. By using the flow values from the two steps (i.e., EMD and Min-Cost Max-Flow) as the matching scores, we are able to combine the two matching results to obtain the desired alignment. Unlike other popular alignment algorithms which are either greedy or incremental, our algorithm globally optimizes the problem to yield an alignment with better quality.
Random Gradient Descent Tree: A Combinatorial Approach for SVM with Outliers
Ding, Hu (State University of New York at Buffalo) | Xu, Jinhui (State University of New York at Buffalo)
Support Vector Machine (SVM) is a fundamental technique in machine learning. A long time challenge facing SVM is how to deal with outliers (caused by mislabeling), as they could make the classes in SVM nonseparable. Existing techniques, such as soft margin SVM, ฮฝ-SVM, and Core-SVM, can alleviate the problem to certain extent, but cannot completely resolve the issue. Recently, there are also techniques available for explicit outlier removal. But they suffer from high time complexity and cannot guarantee quality of solution. In this paper, we present a new combinatorial approach, called Random Gradient Descent Tree (or RGD-tree), to explicitly deal with outliers; this results in a new algorithm called RGD-SVM. Our technique yields provably good solution and can be efficiently implemented for practical purpose. The time and space complexities of our approach only linearly depend on the input size and the dimensionality of the space, which are significantly better than existing ones. Experiments on benchmark datasets suggest that our technique considerably outperforms several popular techniques in most of the cases.
Clustering-Based Collaborative Filtering for Link Prediction
Wang, Xiangyu (State University of New York at Buffalo) | He, Dayu (State University of New York at Buffalo) | Chen, Danyang (State University of New York at Buffalo) | Xu, Jinhui (State University of New York at Buffalo)
In this paper, we propose a novel collaborative filtering approach for predicting the unobserved links in a network (or graph) with both topological and node features. Our approach improves the well-known compressed sensing based matrix completion method by introducing a new multiple-independent-Bernoulli-distribution model as the data sampling mask. It makes better link predictions since the model is more general and better matches the data distributions in many real-world networks, such as social networks like Facebook. As a result, a satisfying stability of the prediction can be guaranteed. To obtain an accurate multiple-independent-Bernoulli-distribution model of the topological feature space, our approach adjusts the sampling of the adjacency matrix of the network (or graph) using the clustering information in the node feature space. This yields a better performance than those methods which simply combine the two types of features. Experimental results on several benchmark datasets suggest that our approach outperforms the best existing link prediction methods.
Finding Median Point-Set Using Earth Mover's Distance
Ding, Hu (State University of New York at Buffalo) | Xu, Jinhui (State University of New York at Buffalo)
In this paper, we study a prototype learning problem, called Median Point-Set, whose objective is to construct a prototype for a set of given point-sets so as to minimize the total Earth Mover's Distances (EMD) between the prototype and the point-sets, where EMD between two point-sets is measured under affine transformation. For this problem, we present the first purely geometric approach. Comparing to existing graph-based approaches (e.g., median graph, shock graph), our approach has several unique advantages: (1) No encoding and decoding procedures are needed to map between objects and graphs, and therefore avoid errors caused by information losing during the mappings; (2) Staying only in the geometric domain makes our approach computationally more efficient and robust to noise. We evaluate the performance of our technique for prototype reconstruction on a random dataset and a benchmark dataset, handwriting Chinese characters. Experiments suggest that our technique considerably outperforms the existing graph-based methods.
k-Prototype Learning for 3D Rigid Structures
Ding, Hu, Berezney, Ronald, Xu, Jinhui
In this paper, we study the following new variant of prototype learning, called {\em $k$-prototype learning problem for 3D rigid structures}: Given a set of 3D rigid structures, find a set of $k$ rigid structures so that each of them is a prototype for a cluster of the given rigid structures and the total cost (or dissimilarity) is minimized. Prototype learning is a core problem in machine learning and has a wide range of applications in many areas. Existing results on this problem have mainly focused on the graph domain. In this paper, we present the first algorithm for learning multiple prototypes from 3D rigid structures. Our result is based on a number of new insights to rigid structures alignment, clustering, and prototype reconstruction, and is practically efficient with quality guarantee. We validate our approach using two type of data sets, random data and biological data of chromosome territories. Experiments suggest that our approach can effectively learn prototypes in both types of data.
Ensemble Clustering using Semidefinite Programming
Singh, Vikas, Mukherjee, Lopamudra, Peng, Jiming, Xu, Jinhui
We consider the ensemble clustering problem where the task is to'aggregate' multiple clustering solutions into a single consolidated clustering that maximizes the shared information among given clustering solutions. We obtain several new results for this problem. First, we note that the notion of agreement under such circumstances can be better captured using an agreement measure based on a 2D string encoding rather than voting strategy based methods proposed in literature. Using this generalization, we first derive a nonlinear optimization model to maximize the new agreement measure. We then show that our optimization problem can be transformed into a strict 0-1 Semidefinite Program (SDP) via novel convexification techniques which can subsequently be relaxed to a polynomial time solvable SDP. Our experiments indicate improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. We discuss evaluations on clustering and image segmentation databases.
Ensemble Clustering using Semidefinite Programming
Singh, Vikas, Mukherjee, Lopamudra, Peng, Jiming, Xu, Jinhui
We consider the ensemble clustering problem where the task is to'aggregate' multiple clustering solutions into a single consolidated clustering that maximizes the shared information among given clustering solutions. We obtain several new results for this problem. First, we note that the notion of agreement under such circumstances can be better captured using an agreement measure based on a 2D string encoding rather than voting strategy based methods proposed in literature. Using this generalization, we first derive a nonlinear optimization model to maximize thenew agreement measure. We then show that our optimization problem can be transformed into a strict 0-1 Semidefinite Program (SDP) via novel convexification techniqueswhich can subsequently be relaxed to a polynomial time solvable SDP. Our experiments indicate improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. We discuss evaluations on clustering and image segmentation databases.