Goto

Collaborating Authors

 Representation Of Examples


Representational dissimilarity metric spaces for stochastic neural networks

arXiv.org Artificial Intelligence

Quantifying similarity between neural representations -- e.g. hidden layer activation vectors -- is a perennial problem in deep learning and neuroscience research. Existing methods compare deterministic responses (e.g. artificial networks that lack stochastic layers) or averaged responses (e.g., trial-averaged firing rates in biological data). However, these measures of _deterministic_ representational similarity ignore the scale and geometric structure of noise, both of which play important roles in neural computation. To rectify this, we generalize previously proposed shape metrics (Williams et al. 2021) to quantify differences in _stochastic_ representations. These new distances satisfy the triangle inequality, and thus can be used as a rigorous basis for many supervised and unsupervised analyses. Leveraging this novel framework, we find that the stochastic geometries of neurobiological representations of oriented visual gratings and naturalistic scenes respectively resemble untrained and trained deep network representations. Further, we are able to more accurately predict certain network attributes (e.g. training hyperparameters) from its position in stochastic (versus deterministic) shape space.


6-DoF Robotic Grasping with Transformer

arXiv.org Artificial Intelligence

Robotic grasping aims to detect graspable points and their corresponding gripper configurations in a particular scene, and is fundamental for robot manipulation. Existing research works have demonstrated the potential of using a transformer model for robotic grasping, which can efficiently learn both global and local features. However, such methods are still limited in grasp detection on a 2D plane. In this paper, we extend a transformer model for 6-Degree-of-Freedom (6-DoF) robotic grasping, which makes it more flexible and suitable for tasks that concern safety. The key designs of our method are a serialization module that turns a 3D voxelized space into a sequence of feature tokens that a transformer model can consume and skip-connections that merge multiscale features effectively. In particular, our method takes a Truncated Signed Distance Function (TSDF) as input. After serializing the TSDF, a transformer model is utilized to encode the sequence, which can obtain a set of aggregated hidden feature vectors through multi-head attention. We then decode the hidden features to obtain per-voxel feature vectors through deconvolution and skip-connections. Voxel feature vectors are then used to regress parameters for executing grasping actions. On a recently proposed pile and packed grasping dataset, we showcase that our transformer-based method can surpass existing methods by about 5% in terms of success rates and declutter rates. We further evaluate the running time and generalization ability to demonstrate the superiority of the proposed method.


Feature space exploration as an alternative for design space exploration beyond the parametric space

arXiv.org Artificial Intelligence

This paper compares the parametric design space with a feature space generated by the extraction of design features using deep learning (DL) as an alternative way for design space exploration. In this comparison, the parametric design space is constructed by creating a synthetic dataset of 15.000 elements using a parametric algorithm and reducing its dimensions for visualization. The feature space -- reduced-dimensionality vector space of embedded data features -- is constructed by training a DL model on the same dataset. We analyze and compare the extracted design features by reducing their dimension and visualizing the results. We demonstrate that parametric design space is narrow in how it describes the design solutions because it is based on the combination of individual parameters. In comparison, we observed that the feature design space can intuitively represent design solutions according to complex parameter relationships. Based on our results, we discuss the potential of translating the features learned by DL models to provide a mechanism for intuitive design exploration space and visualization of possible design solutions.


Interaction Decompositions for Tensor Network Regression

arXiv.org Artificial Intelligence

Tensor network regression has emerged as a promising and active area of machine learning research, having achieved impressive results on common benchmark tasks such as the Movie 100K [1], MNIST [2][3][4][5], and Fashion MNIST [3][4][5] datasets. The effectiveness of these models can be attributed to the tensor-product transformation that is applied to the data features, which maps the original feature vector into an exponentially large vector space. By performing linear operations on this expanded feature space, tensor network models are able to generate regression outputs that are highly non-linear functions of the original features. In most tensor network models, the tensor-product transformation is constructed from a set of vector-valued functions that each act on only a single data feature. The form of these functions is important to the operation of the model, as it determines how regression on the transformed space is related to regression on the original feature space. Conventional wisdom regarding the choice of these functions can be traced back to the parallel works of Stoudenmire and Schwab [2] and Novikov et al. [1], who each proposed a different transformation scheme.


Dataset Structural Index: Leveraging a machine's perspective towards visual data

arXiv.org Artificial Intelligence

But when it came to visual datasets, the field immediately stepped towards the algorithmic side. One of the fundamental reasons was the amount of information needed to translate from an image. But with the introduction of convolutional networks and transfer learning [1], [2], [3], it is possible to convert an image or a visual object into feature vectors without losing too much information about the entity under concern. It defined a way to use feature maps to compare and distinguish one visual object from another [4]. There has been a lot of work in using these feature vector conversions in systems like content-based image retrievals [5], using feature vectors as representations of different scenarios [6], [7]. It is critical to understand that there is a difference between the way a machine looks at the data and the way we do. There are scenarios in which the interpretation through features is a little different from the interpretation of humans. DSI is there to bridge the gap and understand the machine's perspective before molding it to shape better architectures, in turn, better model performances. I think two concepts could be linked together to understand a machine's viewpoint while working with visual


Unbalanced Optimal Transport, from Theory to Numerics

arXiv.org Artificial Intelligence

Optimal Transport (OT) has recently emerged as a central tool in data sciences to compare in a geometrically faithful way point clouds and more generally probability distributions. The wide adoption of OT into existing data analysis and machine learning pipelines is however plagued by several shortcomings. This includes its lack of robustness to outliers, its high computational costs, the need for a large number of samples in high dimension and the difficulty to handle data in distinct spaces. In this review, we detail several recently proposed approaches to mitigate these issues. We insist in particular on unbalanced OT, which compares arbitrary positive measures, not restricted to probability distributions (i.e. their total mass can vary). This generalization of OT makes it robust to outliers and missing data. The second workhorse of modern computational OT is entropic regularization, which leads to scalable algorithms while lowering the sample complexity in high dimension. The last point presented in this review is the Gromov-Wasserstein (GW) distance, which extends OT to cope with distributions belonging to different metric spaces. The main motivation for this review is to explain how unbalanced OT, entropic regularization and GW can work hand-in-hand to turn OT into efficient geometric loss functions for data sciences.


Convex Analysis at Infinity: An Introduction to Astral Space

arXiv.org Artificial Intelligence

Not all convex functions on $\mathbb{R}^n$ have finite minimizers; some can only be minimized by a sequence as it heads to infinity. In this work, we aim to develop a theory for understanding such minimizers at infinity. We study astral space, a compact extension of $\mathbb{R}^n$ to which such points at infinity have been added. Astral space is constructed to be as small as possible while still ensuring that all linear functions can be continuously extended to the new space. Although astral space includes all of $\mathbb{R}^n$, it is not a vector space, nor even a metric space. However, it is sufficiently well-structured to allow useful and meaningful extensions of concepts of convexity, conjugacy, and subdifferentials. We develop these concepts and analyze various properties of convex functions on astral space, including the detailed structure of their minimizers, exact characterizations of continuity, and convergence of descent algorithms.


Fair Recommendation by Geometric Interpretation and Analysis of Matrix Factorization

arXiv.org Artificial Intelligence

Matrix factorization-based recommender system is in effect an angle preserving dimensionality reduction technique. Since the frequency of items follows power-law distribution, most vectors in the original dimension of user feature vectors and item feature vectors lie on the same hyperplane. However, it is very difficult to reconstruct the embeddings in the original dimension analytically, so we reformulate the original angle preserving dimensionality reduction problem into a distance preserving dimensionality reduction problem. We show that the geometric shape of input data of recommender system in its original higher dimension are distributed on co-centric circles with interesting properties, and design a paraboloid-based matrix factorization named ParaMat to solve the recommendation problem. In the experiment section, we compare our algorithm with 8 other algorithms and prove our new method is the most fair algorithm compared with modern day recommender systems such as ZeroMat and DotMat Hybrid.


Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs

arXiv.org Artificial Intelligence

Cellular sheaves equip graphs with a "geometrical" structure by assigning vector spaces and linear maps to nodes and edges. Graph Neural Networks (GNNs) implicitly assume a graph with a trivial underlying sheaf. This choice is reflected in the structure of the graph Laplacian operator, the properties of the associated diffusion equation, and the characteristics of the convolutional models that discretise this equation. In this paper, we use cellular sheaf theory to show that the underlying geometry of the graph is deeply linked with the performance of GNNs in heterophilic settings and their oversmoothing behaviour. By considering a hierarchy of increasingly general sheaves, we study how the ability of the sheaf diffusion process to achieve linear separation of the classes in the infinite time limit expands. At the same time, we prove that when the sheaf is non-trivial, discretised parametric diffusion processes have greater control than GNNs over their asymptotic behaviour. On the practical side, we study how sheaves can be learned from data. The resulting sheaf diffusion models have many desirable properties that address the limitations of classical graph diffusion equations (and corresponding GNN models) and obtain competitive results in heterophilic settings. Overall, our work provides new connections between GNNs and algebraic topology and would be of interest to both fields.


Max-Min Diversification with Fairness Constraints: Exact and Approximation Algorithms

arXiv.org Artificial Intelligence

This has raised concerns about the possibility that algorithms may produce unfair and discriminatory decisions for specific population groups, particularly in sensitive socio-computational domains such as voting, hiring, banking, education, and criminal justice [12, 25]. To alleviate such concerns, there has been a lot of research devoted to incorporating fairness into the algorithms for automated decision tasks, including classification [14], clustering [10], ranking [24, 32], matching [28], and data summarization [8, 20]. This paper considers the diversity maximization problem and addresses its fairness-aware variant. The problem consists in selecting a diverse subset of items from a given dataset and is encountered in data summarization [8, 23], web search [2], recommendation [21], feature selection [31], and elsewhere [34]. Existing literature on the problem of diversity maximization primarily focuses on two objectives, namely max-min diversification (MMD), which aims to maximize the minimum distance between any pair of selected items, and max-sum diversification (MSD), which seeks to maximize the sum of pairwise distances between selected items. As shown in Figure 1, MMD tends to cover the data range uniformly, while MSD tends to pick "outliers" and may include highly similar items in the solution. Since the notion of diversity captured by MMD better represents the property that data summarization, feature selection, and many other tasks target with their solutions, we will only consider MMD in this paper. To be precise, given a set V of n items in a metric space and a positive integer k n, MMD asks for a size-k subset S of V to maximize the minimum pairwise distance within S. In particular, we study the fair max-min diversification (FMMD) problem, a variant of MMD that aims not only to maximize the diversity measure defined above but also to guarantee the satisfaction of group fairness constraints as described below.