Plotting

 Wang, Di


$p$-Norm Flow Diffusion for Local Graph Clustering

arXiv.org Machine Learning

Local graph clustering and the closely related seed set expansion problem are primitives on graphs that are central to a wide range of analytic and learning tasks such as local clustering, community detection, nodes ranking and feature inference. Prior work on local graph clustering mostly falls into two categories with numerical and combinatorial roots respectively. In this work, we draw inspiration from both fields and propose a family of convex optimization formulations based on the idea of diffusion with p-norm network flow for $p\in (1,\infty)$. In the context of local clustering, we characterize the optimal solutions for these optimization problems and show their usefulness in finding low conductance cuts around input seed set. In particular, we achieve quadratic approximation of conductance in the case of $p=2$ similar to the Cheeger-type bounds of spectral methods, constant factor approximation when $p\rightarrow\infty$ similar to max-flow based methods, and a smooth transition for general $p$ values in between. Thus, our optimization formulation can be viewed as bridging the numerical and combinatorial approaches, and we can achieve the best of both worlds in terms of speed and noise robustness. We show that the proposed problem can be solved in strongly local running time for $p\ge 2$ and conduct empirical evaluations on both synthetic and real-world graphs to illustrate our approach compares favorably with existing methods.


Towards Assessment of Randomized Smoothing Mechanisms for Certifying Adversarial Robustness

arXiv.org Machine Learning

As a certified defensive technique, randomized smoothing has received considerable attention due to its scalability to large datasets and neural networks. However, several important questions remain unanswered, such as (i) whether the Gaussian mechanism is an appropriate option for certifying $\ell_2$-norm robustness, and (ii) whether there is an appropriate randomized (smoothing) mechanism to certify $\ell_\infty$-norm robustness. To shed light on these questions, we argue that the main difficulty is how to assess the appropriateness of each randomized mechanism. In this paper, we propose a generic framework that connects the existing frameworks in \cite{lecuyer2018certified, li2019certified}, to assess randomized mechanisms. Under our framework, for a randomized mechanism that can certify a certain extent of robustness, we define the magnitude of its required additive noise as the metric for assessing its appropriateness. We also prove lower bounds on this metric for the $\ell_2$-norm and $\ell_\infty$-norm cases as the criteria for assessment. Based on our framework, we assess the Gaussian and Exponential mechanisms by comparing the magnitude of additive noise required by these mechanisms and the lower bounds (criteria). We first conclude that the Gaussian mechanism is indeed an appropriate option to certify $\ell_2$-norm robustness. Surprisingly, we show that the Gaussian mechanism is also an appropriate option for certifying $\ell_\infty$-norm robustness, instead of the Exponential mechanism. Finally, we generalize our framework to $\ell_p$-norm for any $p\geq2$. Our theoretical findings are verified by evaluations on CIFAR10 and ImageNet.


Faster width-dependent algorithm for mixed packing and covering LPs

Neural Information Processing Systems

In this paper, we give a faster width-dependent algorithm for mixed packing-covering LPs. Mixed packing-covering LPs are fundamental to combinatorial optimization in computer science and operations research. Our algorithm finds a $1 \eps$ approximate solution in time $O(Nw/ \varepsilon)$, where $N$ is number of nonzero entries in the constraint matrix, and $w$ is the maximum number of nonzeros in any constraint. This algorithm is faster than Nesterov's smoothing algorithm which requires $O(N\sqrt{n}w/ \eps)$ time, where $n$ is the dimension of the problem. Our work utilizes the framework of area convexity introduced in [Sherman-FOCS'17] to obtain the best dependence on $\varepsilon$ while breaking the infamous $\ell_{\infty}$ barrier to eliminate the factor of $\sqrt{n}$.


Estimating Smooth GLM in Non-interactive Local Differential Privacy Model with Public Unlabeled Data

arXiv.org Machine Learning

In this paper, we study the problem of estimating smooth Generalized Linear Models (GLM) in the Non-interactive Local Differential Privacy (NLDP) model. Different from its classical setting, our model allows the server to access some additional public but unlabeled data. By using Stein's lemma and its variants, we first show that there is an $(\epsilon, \delta)$-NLDP algorithm for GLM (under some mild assumptions), if each data record is i.i.d sampled from some sub-Gaussian distribution with bounded $\ell_1$-norm. Then with high probability, the sample complexity of the public and private data, for the algorithm to achieve an $\alpha$ estimation error (in $\ell_\infty$-norm), is $O(p^2\alpha^{-2})$ and ${O}(p^2\alpha^{-2}\epsilon^{-2})$, respectively, if $\alpha$ is not too small ({\em i.e.,} $\alpha\geq \Omega(\frac{1}{\sqrt{p}})$), where $p$ is the dimensionality of the data. This is a significant improvement over the previously known quasi-polynomial (in $\alpha$) or exponential (in $p$) complexity of GLM with no public data. Also, our algorithm can answer multiple (at most $\exp(O(p))$) GLM queries with the same sample complexities as in the one GLM query case with at least constant probability. We then extend our idea to the non-linear regression problem and show a similar phenomenon for it. Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real world datasets. To our best knowledge, this is the first paper showing the existence of efficient and effective algorithms for GLM and non-linear regression in the NLDP model with public unlabeled data.


Knowledge-Enriched Transformer for Emotion Detection in Textual Conversations

arXiv.org Artificial Intelligence

Messages in human conversations inherently convey emotions. The task of detecting emotions in textual conversations leads to a wide range of applications such as opinion mining in social networks. However, enabling machines to analyze emotions in conversations is challenging, partly because humans often rely on the context and commonsense knowledge to express emotions. In this paper, we address these challenges by proposing a Knowledge-Enriched Transformer (KET), where contextual utterances are interpreted using hierarchical self-attention and external commonsense knowledge is dynamically leveraged using a context-aware affective graph attention mechanism. Experiments on multiple textual conversation datasets demonstrate that both context and commonsense knowledge are consistently beneficial to the emotion detection performance. In addition, the experimental results show that our KET model outperforms the state-of-the-art models on most of the tested datasets in F1 score.


Heterogeneous Graph Convolutional Networks for Temporal Community Detection

arXiv.org Machine Learning

The Graph Convolutional Networks (GCN) has demonstrated superior performance in representing graph data, especially homogeneous graphs. However, the real-world graph data is usually heterogeneous and evolves with time, e.g., Facebook and DBLP, which has seldom been studied. To cope with this issue, we propose a novel approach named temporal heterogeneous graph convolutional network (THGCN). THGCN first embeds both spatial information and node attribute information together. Then, it captures short-term evolutionary patterns from the aggregations of embedded graph signals through compression network. Meanwhile, the long-term evolutionary patterns of heterogeneous graph data are also modeled via a TCN temporal convolutional network. To the best of our knowledge, this is the first attempt to model temporal heterogeneous graph data with a focus on community discovery task.


Distributed Equivalent Substitution Training for Large-Scale Recommender Systems

arXiv.org Machine Learning

We present Distributed Equivalent Substitution (DES) training, a novel distributed training framework for recommender systems with large-scale dynamic sparse features. Our framework achieves faster convergence with less communication overhead and better computing resource utilization. DES strategy splits a weights-rich operator into sub-operators with co-located weights and aggregates partial results with much smaller communication cost to form a computationally equivalent substitution to the original operator. We show that for different types of models that recommender systems use, we can always find computational equivalent substitutions and splitting strategies for their weights-rich operators with theoretical communication load reduced ranging from 72.26% to 99.77%. We also present an implementation of DES that outperforms state-of-the-art recommender systems. Experiments show that our framework achieves up to 83% communication savings compared to other recommender systems, and can bring up to 4.5x improvement on throughput for deep models.


Compact Autoregressive Network

arXiv.org Machine Learning

Recurrent neural networks (RNN) and their variants, such as Long-Short Term Memory (Hochreiter and Schmidhuber, 1997) and Gated Recurrent Unit (Cho et al., 2014), are commonly used as the default architecture or even the synonym of sequence modeling by deep learning practitioners (Goodfellow et al., 2016). In the meanwhile, especially for high-dimensional time series, we may also consider the autoregressive modeling or multi-task learning, null y t f (y t 1, y t 2,..., y t P), (1) where the output null y t and each input y t i are N -dimensional, and the lag P can be very large for accomodating sequential dependence. Some non-recurrent feed-forward networks with convolutional or other certain architectures have been proposed recently for sequence modeling, and are shown to have state-of-the-art accuracy. For example, some autoregressive networks, such as PixelCNN (Van den Oord et al., 2016b) and WaveNet (Van den Oord et al., 2016a) for image and audio sequence modeling, are compelling alternatives to the recurrent networks. This paper aims at the autoregressive model (1) with a large number of sequences.


EEG-Based Emotion Recognition Using Regularized Graph Neural Networks

arXiv.org Artificial Intelligence

EEG signals measure the neuronal activities on different brain regions via electrodes. Many existing studies on EEG-based emotion recognition do not exploit the topological structure of EEG signals. In this paper, we propose a regularized graph neural network (RGNN) for EEG-based emotion recognition, which is biologically supported and captures both local and global inter-channel relations. Specifically, we model the inter-channel relations in EEG signals via an adjacency matrix in our graph neural network where the connection and sparseness of the adjacency matrix are supported by the neurosicience theories of human brain organization. In addition, we propose two regularizers, namely node-wise domain adversarial training (NodeDAT) and emotion-aware distribution learning (EmotionDL), to improve the robustness of our model against cross-subject EEG variations and noisy labels, respectively. To thoroughly evaluate our model, we conduct extensive experiments in both subject-dependent and subject-independent classification settings on two public datasets: SEED and SEED-IV. Our model obtains better performance than competitive baselines such as SVM, DBN, DGCNN, BiDANN, and the state-of-the-art BiHDM in most experimental settings . Our model analysis demonstrates that the proposed biologically supported adjacency matrix and two regularizers contribute consistent and significant gain to the performance. Investigations on the neuronal activities reveal that pre-frontal, parietal and occipital regions may be the most informative regions for emotion recognition, which is consistent with relevant prior studies. In addition, experimental results suggest that global inter-channel relations between the left and right hemispheres are important for emotion recognition and local inter-channel relations between (FP1, AF3), (F6, F8) and (FP2, AF4) may also provide useful information.


Scalable Topological Data Analysis and Visualization for Evaluating Data-Driven Models in Scientific Applications

arXiv.org Machine Learning

With the rapid adoption of machine learning techniques for large-scale applications in science and engineering comes the convergence of two grand challenges in visualization. First, the utilization of black box models (e.g., deep neural networks) calls for advanced techniques in exploring and interpreting model behaviors. Second, the rapid growth in computing has produced enormous datasets that require techniques that can handle millions or more samples. Although some solutions to these interpretability challenges have been proposed, they typically do not scale beyond thousands of samples, nor do they provide the high-level intuition scientists are looking for. Here, we present the first scalable solution to explore and analyze high-dimensional functions often encountered in the scientific data analysis pipeline. By combining a new streaming neighborhood graph construction, the corresponding topology computation, and a novel data aggregation scheme, namely topology aware datacubes, we enable interactive exploration of both the topological and the geometric aspect of high-dimensional data. Following two use cases from high-energy-density (HED) physics and computational biology, we demonstrate how these capabilities have led to crucial new insights in both applications.