Goto

Collaborating Authors

 dimensionality reduction


Nonlinear Data Integration via Kernel Methods for Data Collaboration Analysis

arXiv.org Machine Learning

Collaborative analysis of decentralized confidential datasets is important, but direct sharing of original datasets is often restricted by privacy and institutional constraints. Data collaboration (DC) analysis transforms each dataset into privacy-preserving intermediate representations via party-specific obfuscation functions and integrates them into common collaboration representations using an anchor dataset. However, many existing DC analysis methods rely on linear transformations for data obfuscation and integration, which may increase reconstruction risk. Although nonlinear dimensionality reduction can mitigate this risk, conventional linear integration methods cannot accurately align intermediate representations produced by nonlinear transformations. Moreover, existing integration methods mainly minimize discrepancies among parties and do not explicitly incorporate geometric or target-variable information useful for downstream analysis. To overcome these limitations, we first formulate linear kernel integration (LKI) as a linear integration method and then kernelize it to obtain nonlinear kernel integration (NKI). NKI admits a globally optimal solution via kernel ridge regression and an eigenvalue problem. We also introduce graph regularization and a centering constraint so that the target representation can capture geometric and target-variable information useful for downstream analysis. Experiments on image classification tasks demonstrate that NKI improves classification accuracy over existing linear integration methods under nonlinear dimensionality reduction, with further gains from target-variable-aware graph regularization and centering. The results also show that dimensionality reduction choices substantially affect both classification accuracy and reconstruction risk.


Center Smoothing: Certified Robustness for Networks with Structured Outputs Appendix

Neural Information Processing Systems

Let, y be a point in that intersection. Since, by definition, ห†r(x0,) is the radius of the smallest ball with 1/2 + probability mass of f(x0 + P) over all possible centers in Rk and ห†Ris the radius of the smallest such ball centered at ห†f(x), we must have ห†r(x0,) ห†R. Consider the smallest ball B(z0,ห†r(x, 1)) that encloses at least 1/2 + 1 probability mass of f(x+ P). Since, r is the radius of the minimum enclosing ball that contains at least half of the points in Z, we have r ห†r(x, 1). Now, using the definition of ห†Rand following the same reasoning as theorem 2, we can say that, d( ห†f(x), ห†f(x0)) ฮฒห†r(x0,) + ห†R (1 + ฮฒ) ห†R.


Appendix: Learning Compact Representations of Neural Networks using DiscriminAtive Masking (DAM) AAnalysis of the DAMGate Function Dynamics During Training

Neural Information Processing Systems

In this section, we theoretically analyze the dynamics of the DAM mask gi at the i-th layer as the training process unfolds. The loss function for training the neural network for the target task can then be denoted as L= L(f(x,ฮ˜,ฮฒi)) (e.g., cross-entropy loss for supervised structured pruning problems and reconstruction error for representation learning problems), where xdenotes the input features to the neural network. Using gradient descent methods with a learning rate of ฮท, the expected update formula of ฮฒi in DAM is given by: ฮฒi = ฮทEx Dtr [ ฮฒiL(f(x,ฮ˜,ฮฒi)) + ฮป ฮฒiฮฒi/(l 1)] (2) = ฮทEx Dtr [ ฮฒiL(f(x,ฮ˜,ฮฒi))] ฮทฮป/(l 1) (3) Let hi be the layer output before applying the DAM mask, and the masked output be represented as oi = hi gi after applying the gate. For the j-th neuron, gij/ ฮฒi = 0 if and only if ฮพj(ฮฒi)/ ฮฒi = 0. Since tanh(z) has non-zero gradients for z >0, the gradient of ฮพj(ฮฒi) is 0 only when kj/ni + ฮฒi 0, i.e., the mask value of the neuron is 0 (or in other words, it is deactivated or dead). Let us denote the set of all neuron indices with non-zero mask values (also referred to as active neurons) as J. Equation 4 can then be simplified as: ฮฒiL(f(x,ฮ˜,ฮฒi)) = ฮฑi X We can make the following two observations: (i) only those neurons that are active (i.e., have non-zero mask values) have a contribution towards updating ฮฒi and moving the gate function. We name these neurons as support neurons and their position in the ordering of neurons as the transitioning zone of the gate function.



Disentangling Identifiable Features from Noisy Data with Structured Nonlinear ICA

Neural Information Processing Systems

We introduce a new general identifiable framework for principled disentanglement referred to as Structured Nonlinear Independent Component Analysis (SNICA). Our contribution is to extend the identifiability theory of deep generative models for a very broad class of structured models. While previous works have shown identifiability for specific classes of time-series models, our theorems extend this to more general temporal structures as well as to models with more complex structures such as spatial dependencies. In particular, we establish the major result that identifiability for this framework holds even in the presence of noise of unknown distribution. Finally, as an example of our framework's flexibility, we introduce the first nonlinear ICA model for time-series that combines the following very useful properties: it accounts for both nonstationarity and autocorrelation in a fully unsupervised setting; performs dimensionality reduction; models hidden states; and enables principled estimation and inference by variational maximum-likelihood.


Nearly Isometric Embedding by Relaxation

Neural Information Processing Systems

Many manifold learning algorithms aim to create embeddings with low or no distortion (isometric). If the data has intrinsic dimension d, it is often impossible to obtain an isometric embedding in ddimensions, but possible in s > ddimensions. Yet, most geometry preserving algorithms cannot do the latter. This paper proposes an embedding algorithm to overcome this. The algorithm accepts as input, besides the dimension d, an embedding dimension s d.


Dimensionality Reduction of Massive Sparse Datasets Using Coresets

Neural Information Processing Systems

In this paper we present a practical solution with performance guarantees to the problem of dimensionality reduction for very large scale sparse matrices. We show applications of our approach to computing the Principle Component Analysis (PCA) of any n dmatrix, using one pass over the stream of its rows. Our solution uses coresets: a scaled subset of the n rows that approximates their sum of squared distances to every k-dimensional affine subspace. An open theoretical problem has been to compute such a coreset that is independent of both n and d. An open practical problem has been to compute a non-trivial approximation to the PCA of very large but sparse databases such as the Wikipedia document-term matrix in a reasonable time. We answer both of these questions affirmatively. Our main technical result is a new framework for deterministic coreset constructions based on a reduction to the problem of counting items in a stream.


Joint Representation Learning and Clustering via Gradient-Based Manifold Optimization

arXiv.org Machine Learning

Clustering and dimensionality reduction have been crucial topics in machine learning and computer vision. Clustering high-dimensional data has been challenging for a long time due to the curse of dimensionality. For that reason, a more promising direction is the joint learning of dimension reduction and clustering. In this work, we propose a Manifold Learning Framework that learns dimensionality reduction and clustering simultaneously. The proposed framework is able to jointly learn the parameters of a dimension reduction technique (e.g. linear projection or a neural network) and cluster the data based on the resulting features (e.g. under a Gaussian Mixture Model framework). The framework searches for the dimension reduction parameters and the optimal clusters by traversing a manifold,using Gradient Manifold Optimization. The obtained The proposed framework is exemplified with a Gaussian Mixture Model as one simple but efficient example, in a process that is somehow similar to unsupervised Linear Discriminant Analysis (LDA). We apply the proposed method to the unsupervised training of simulated data as well as a benchmark image dataset (i.e. MNIST). The experimental results indicate that our algorithm has better performance than popular clustering algorithms from the literature.


Model-based targeted dimensionality reduction for neuronal population data

Neural Information Processing Systems

Summarizing high-dimensional data using a small number of parameters is a ubiquitous first step in the analysis of neuronal population activity. Recently developed methods use targeted approaches that work by identifying multiple, distinct low-dimensional subspaces of activity that capture the population response to individual experimental task variables, such as the value of a presented stimulus or the behavior of the animal. These methods have gained attention because they decompose total neural activity into what are ostensibly different parts of a neuronal computation. However, existing targeted methods have been developed outside of the confines of probabilistic modeling, making some aspects of the procedures ad hoc, or limited in flexibility or interpretability. Here we propose a new model-based method for targeted dimensionality reduction based on a probabilistic generative model of the population response data.


Scaling Gaussian Process Regression with Derivatives

Neural Information Processing Systems

Computing the model fit term, as well as the predictive moments of the GP, requires solving linear systems with the kernel matrix, while the complexity term, or Occam'sfactor[18],isthelogdeterminant ofthekernelmatrix.