Goto

Collaborating Authors

 Supervised Learning


The Unbalanced Gromov Wasserstein Distance: Conic Formulation and Relaxation

Neural Information Processing Systems

Comparing metric measure spaces (i.e. a metric space endowed with a probability distribution) is at the heart of many machine learning problems. The most popular distance between such metric measure spaces is the Gromov-Wasserstein (GW) distance, which is the solution of a quadratic assignment problem. The GW distance is however limited to the comparison of metric measure spaces endowed with a probability distribution. To alleviate this issue, we introduce two Unbalanced Gromov-Wasserstein formulations: a distance and a more tractable upper-bounding relaxation. They both allow the comparison of metric spaces equipped with arbitrary positive measures up to isometries.


Reviews: Beyond Vector Spaces: Compact Data Representation as Differentiable Weighted Graphs

Neural Information Processing Systems

Except the presence of each edge is probabilistic than deterministic, the core idea is quite similar to Isomap. The novelty should be better addressed by comparing to Isomap. For example, edges between words that frequently co-occur in the same contexts are not independent to each other. Edges between pixels in small coherent regions are not independent. Do we eventually need to know such dependency structures a priori to correctly represent arbitrary geometry in the data?


Beyond Vector Spaces: Compact Data Representation as Differentiable Weighted Graphs

Neural Information Processing Systems

Learning useful representations is a key ingredient to the success of modern machine learning. Currently, representation learning mostly relies on embedding data into Euclidean space. However, recent work has shown that data in some domains is better modeled by non-euclidean metric spaces, and inappropriate geometry can result in inferior performance. In this paper, we aim to eliminate the inductive bias imposed by the embedding space geometry. Namely, we propose to map data into more general non-vector metric spaces: a weighted graph with a shortest path distance. By design, such graphs can model arbitrary geometry with a proper configuration of edges and weights. Our main contribution is PRODIGE: a method that learns a weighted graph representation of data end-to-end by gradient descent. Greater generality and fewer model assumptions make PRODIGE more powerful than existing embedding-based approaches. We confirm the superiority of our method via extensive experiments on a wide range of tasks, including classification, compression, and collaborative filtering.


Reviews: Beyond Vector Spaces: Compact Data Representation as Differentiable Weighted Graphs

Neural Information Processing Systems

The paper proposed a quite interesting idea of representing data by weighted graphs (shortest path between nodes). Reviewers have raised concerns on edge dependency and the given similarity metric. However, I'm less worried about making the independence assumption because after all, it's a model, and it seems to work well in experiments. Likewise, it is also common in variational inference to use independent distribution to approximate a graphical model, based on which learning is carried out. What interests me more is the general methodology of optimization.


A Simplicial Complexes, Cycles, Barcodes

Neural Information Processing Systems

A.1 Background The simplicial complex is a combinatorial data that can be thought of as a higher-dimensional generalization of a graph. Simplicial complex S is a collection of k simplices, which are finite (k + 1) elements subsets in a given set V, for k 0. The collection of simplices S must satisfy the condition that for each σ S, σ S. A simplicial complex consisting only of 0 and 1 simplices is a graph. Let (Γ, m) be a weighted graph with distance-like weights, where m is the symmetric matrix of the weights attached to the edges of the graph Γ. Even though such weighted graphs do not always come from a set of points in metric space, barcodes of weighted graphs have been successfully applied in many situations (networks, fmri, medical data, graph's classification etc). A.2 Simplices, describing discrepancies between the two manifolds Here we gather more details on the construction of sets of simplicies that describe discrepancies between two point clouds P and Q sampled from the two distributions P, Q.


Reviews: Deep Structured Prediction for Facial Landmark Detection

Neural Information Processing Systems

The integration of convnets with the conditional random fields to model the structural dependencies of facial landmarks during face alignment is nice contribution. Previously proposed methods in this direction were hybrid systems (eg. OpenFace versions) and not fully integrated. The authors evaluate on multiple datasets (300W, 300W-Video, Menpo & COFW-68) and compare results with other methods. Both inter- and cross-dataset performance are provided.



Reviews: Exact inference in structured prediction

Neural Information Processing Systems

Overview: - This paper studies the conditions for exact recovery of ground-truth labels in structured prediction under some data generation assumptions. In particular, the analysis generalizes the one in Globerson et al. (2015) from grid graphs to general connected graphs, providing high-probability guarantees for exact label recovery which depend on structural properties of the graph. On the other hand, the assumed generative process (lines 89-101, proposed in Globerson et al., 2015) is somewhat toyish which might make the results less interesting. Therefore, I am inclined towards acceptance but not strongly. Comments: - I feel like the presentation can be greatly improved by including an overview of the main result at the beginning of Section 3. In particular, you can state the main result, which is actually given in Remark 2 (!), and then provide some high-level intuition on the path to prove it.


Exact inference in structured prediction

Neural Information Processing Systems

Structured prediction can be thought of as a simultaneous prediction of multiple labels. This is often done by maximizing a score function on the space of labels, which decomposes as a sum of pairwise and unary potentials. The above is naturally modeled with a graph, where edges and vertices are related to pairwise and unary potentials, respectively. We consider the generative process proposed by Globerson et al. (2015) and apply it to general connected graphs. We analyze the structural conditions of the graph that allow for the exact recovery of the labels. Our results show that exact recovery is possible and achievable in polynomial time for a large class of graphs. In particular, we show that graphs that are bad expanders can be exactly recovered by adding small edge perturbations coming from the Erdős-Rényi model. Finally, as a byproduct of our analysis, we provide an extension of Cheeger's inequality.


Reviews: Exact inference in structured prediction

Neural Information Processing Systems

The paper gives a theoretical analysis of Markov random fields. The authors answer the question of when exact inference can be done exactly in a polynomial time. This is a generalization of a result of in Globerson et al. (2015) from grid graphs to general connected graphs, which is on my opinion, a non-trivial generalization. The paper is self contained and readable for the Machine Learning community, although quite technical. Indeed, I consider that it is a theoretical paper that has all the quality for a NeurIPS acceptance.