Goto

Collaborating Authors

 Country


Clustering by Directly Disentangling Latent Space

arXiv.org Machine Learning

To overcome the high dimensionality of data, learning latent feature representations for clustering has been widely studied recently. However, it is still challenging to learn "cluster-friendly" latent representations due to the unsupervised fashion of clustering. In this paper, we propose Disentangling Latent Space Clustering (DLS-Clustering), a new clustering mechanism that directly learning cluster assignment during the disentanglement of latent spacing without constructing the "cluster-friendly" latent representation and additional clustering methods. We achieve the bidirectional mapping by enforcing an inference network (i.e. encoder) and the generator of GAN to form a deterministic encoder-decoder pair with a maximum mean discrepancy (MMD)-based regularization. We utilize a weight-sharing procedure to disentangle latent space into the one-hot discrete latent variables and the continuous latent variables. The disentangling process is actually performing the clustering operation. Eventually the one-hot discrete latent variables can be directly expressed as clusters, and the continuous latent variables represent remaining unspecified factors. Experiments on six benchmark datasets of different types demonstrate that our method outperforms existing state-of-the-art methods. We further show that the latent representations from DLS-Clustering also maintain the ability to generate diverse and high-quality images, which can support more promising application scenarios.


CHEETAH: An Ultra-Fast, Approximation-Free, and Privacy-Preserved Neural Network Framework based on Joint Obscure Linear and Nonlinear Computations

arXiv.org Machine Learning

Machine Learning as a Service (MLaaS) is enabling a wide range of smart applications on end devices. However, such convenience comes with a cost of privacy because users have to upload their private data to the cloud. This research aims to provide effective and efficient MLaaS such that the cloud server learns nothing about user data and the users cannot infer the proprietary model parameters owned by the server. This work makes the following contributions. First, it unveils the fundamental performance bottleneck of existing schemes due to the heavy permutations in computing linear transformation and the use of communication intensive Garbled Circuits for nonlinear transformation. Second, it introduces an ultra-fast secure MLaaS framework, CHEETAH, which features a carefully crafted secret sharing scheme that runs significantly faster than existing schemes without accuracy loss. Third, CHEETAH is evaluated on the benchmark of well-known, practical deep networks such as AlexNet and VGG-16 on the MNIST and ImageNet datasets. The results demonstrate more than 100x speedup over the fastest GAZELLE (Usenix Security'18), 2000x speedup over MiniONN (ACM CCS'17) and five orders of magnitude speedup over CryptoNets (ICML'16). This significant speedup enables a wide range of practical applications based on privacy-preserved deep neural networks.


92c/MFlops/s, Ultra-Large-Scale Neural-Network Training on a PIII Cluster

arXiv.org Machine Learning

Artificial neural networks with millions of adjustable parameters and a similar number of training examples are a potential solution for difficult, large-scale pattern recognition problems in areas such as speech and face recognition, classification of large volumes of web data, and finance. The bottleneck is that neural network training involves iterative gradient descent and is extremely computationally intensive. In this paper we present a technique for distributed training of Ultra Large Scale Neural Networks (ULSNN) on Bunyip, a Linux-based cluster of 196 Pentium III processors. To illustrate ULSNN training we describe an experiment in which a neural network with 1.73 million adjustable parameters was trained to recognize machine-printed Japanese characters from a database containing 9 million training patterns. The training runs with a average performance of 163.3 GFlops/s (single precision). With a machine cost of \$150,913, this yields a price/performance ratio of 92.4c/MFlops/s (single precision). For comparison purposes, training using double precision and the ATLAS DGEMM produces a sustained performance of 70 MFlops/s or \$2.16 / MFlop/s (double precision).


Nonconvex Stochastic Nested Optimization via Stochastic ADMM

arXiv.org Machine Learning

We consider the stochastic nested composition optimization problem where the objective is a composition of two expected-value functions. We proposed the stochastic ADMM to solve this complicated objective. In order to find an $\epsilon$ stationary point where the expected norm of the subgradient of corresponding augmented Lagrangian is smaller than $\epsilon$, the total sample complexity of our method is $\mathcal{O}(\epsilon^{-3})$ for the online case and $\mathcal{O} \Bigl((2N_1 + N_2) + (2N_1 + N_2)^{1/2}\epsilon^{-2}\Bigr)$ for the finite sum case. The computational complexity is consistent with proximal version proposed in \cite{zhang2019multi}, but our algorithm can solve more general problem when the proximal mapping of the penalty is not easy to compute.


Negative sampling in semi-supervised learning

arXiv.org Machine Learning

We introduce Negative Sampling in Semi-Supervised Learning (NS3L), a simple, fast, easy to tune algorithm for semi-supervised learning (SSL). NS3L is motivated by the success of negative sampling/contrastive estimation. We demonstrate that adding the NS3L loss to state-of-the-art SSL algorithms, such as the Virtual Adversarial Training (VAT), significantly improves upon vanilla VAT and its variant, VAT with Entropy Minimization. By adding the NS3L loss to MixMatch, the current state-of-the-art approach on semi-supervised tasks, we observe significant improvements over vanilla MixMatch. We conduct extensive experiments on the CIFAR10, CIFAR100, SVHN and STL10 benchmark datasets.


Coordination Group Formation for OnLine Coordinated Routing Mechanisms

arXiv.org Machine Learning

This study considers that the collective route choices of travelers en route represent a resolution of their competition on network routes. Well understanding this competition and coordinating their route choices help mitigate urban traffic congestion. Even though existing studies have developed such mechanisms (e.g., the CRM [1]), we still lack the quantitative method to evaluate the coordination penitential and identify proper coordination groups (CG) to implement the CRM. Thus, they hit prohibitive computing difficulty when implemented with many opt-in travelers. Motived by this view, this study develops mathematical approaches to quantify the coordination potential between two and among multiple travelers. Next, we develop the adaptive centroid-based clustering algorithm (ACCA), which splits travelers en route in a local network into CGs, each with proper size and strong coordination potential. Moreover, the ACCA is statistically secured to stop at a local optimal clustering solution, which balances the inner-cluster and inter-cluster coordination potential. It can be implemented by parallel computation to accelerate its computing efficiency. Furthermore, we propose a clustering based coordinated routing mechanism (CB-CRM), which implements a CRM on each individual CG. The numerical experiments built upon both Sioux Falls and Hardee city networks show that the ACCA works efficiently to form proper coordination groups so that as compared to the CRM, the CB-CRM significantly improves computation efficiency with minor system performance loss in a large network. This merit becomes more apparent under high penetration and congested traffic condition. Last, the experiments validate the good features of the ACCA as well as the value of implementing parallel computation.


Incentivized Exploration for Multi-Armed Bandits under Reward Drift

arXiv.org Machine Learning

We study incentivized exploration for the multi-armed bandit (MAB) problem where the players receive compensation for exploring arms other than the greedy choice and may provide biased feedback on reward. We seek to understand the impact of this drifted reward feedback by analyzing the performance of three instantiations of the incentivized MAB algorithm: UCB, $\varepsilon$-Greedy, and Thompson Sampling. Our results show that they all achieve $\mathcal{O}(\log T)$ regret and compensation under the drifted reward, and are therefore effective in incentivizing exploration. Numerical examples are provided to complement the theoretical analysis.


Detecting Patterns of Physiological Response to Hemodynamic Stress via Unsupervised Deep Learning

arXiv.org Machine Learning

Monitoring physiological responses to hemodynamic stress can help in determining appropriate treatment and ensuring good patient outcomes. Physicians' intuition suggests that the human body has a number of physiological response patterns to hemorrhage which escalate as blood loss continues, however the exact etiology and phenotypes of such responses are not well known or understood only at a coarse level. Although previous research has shown that machine learning models can perform well in hemorrhage detection and survival prediction, it is unclear whether machine learning could help to identify and characterize the underlying physiological responses in raw vital sign data. We approach this problem by first transforming the high-dimensional vital sign time series into a tractable, lower-dimensional latent space using a dilated, causal convolutional encoder model trained purely unsupervised. Second, we identify informative clusters in the embeddings. By analyzing the clusters of latent embeddings and visualizing them over time, we hypothesize that the clusters correspond to the physiological response patterns that match physicians' intuition. Furthermore, we attempt to evaluate the latent embeddings using a variety of methods, such as predicting the cluster labels using explainable features.


Time-Aware Prospective Modeling of Users for Online Display Advertising

arXiv.org Machine Learning

Prospective display advertising poses a great challenge for large advertising platforms as the strongest predictive signals of users are not eligible to be used in the conversion prediction systems. To that end efforts are made to collect as much information as possible about each user from various data sources and to design powerful models that can capture weaker signals ultimately obtaining good quality of conversion prediction probability estimates. In this study we propose a novel time-aware approach to model heterogeneous sequences of users' activities and capture implicit signals of users' conversion intents. On two real-world datasets we show that our approach outperforms other, previously proposed approaches, while providing interpretability of signal impact to conversion probability.


Efficient Fair Principal Component Analysis

arXiv.org Machine Learning

The flourishing assessments of fairness measure in machine learning algorithms have shown that dimension reduction methods such as PCA treat data from different sensitive groups unfairly. In particular, by aggregating data of different groups, the reconstruction error of the learned subspace becomes biased towards some populations that might hurt or benefit those groups inherently, leading to an unfair representation. On the other hand, alleviating the bias to protect sensitive groups in learning the optimal projection, would lead to a higher reconstruction error overall. This introduces a trade-off between sensitive groups' sacrifices and benefits, and the overall reconstruction error. In this paper, in pursuit of achieving fairness criteria in PCA, we introduce a more efficient notion of Pareto fairness, cast the Pareto fair dimensionality reduction as a multi-objective optimization problem, and propose an adaptive gradient-based algorithm to solve it. Using the notion of Pareto optimality, we can guarantee that the solution of our proposed algorithm belongs to the Pareto frontier for all groups, which achieves the optimal trade-off between those aforementioned conflicting objectives. This framework can be efficiently generalized to multiple group sensitive features, as well. We provide convergence analysis of our algorithm for both convex and non-convex objectives and show its efficacy through empirical studies on different datasets, in comparison with the state-of-the-art algorithm.