Goto

Collaborating Authors

 machine learning


Hierarchical Optimal Transport for Document Representation

Neural Information Processing Systems

The ability to measure similarity between documents enables intelligent summarization and analysis of large corpora. Past distances between documents suffer from either an inability to incorporate semantic similarities between words or from scalability issues. As an alternative, we introduce hierarchical optimal transport as a meta-distance between documents, where documents are modeled as distributions over topics, which themselves are modeled as distributions over words. We then solve an optimal transport problem on the smaller topic space to compute a similarity score. We give conditions on the topics under which this construction defines a distance, and we relate it to the word mover's distance. We evaluate our technique for k-NN classification and show better interpretability and scalability with comparable performance to current methods at a fraction of the cost.


Discriminator optimal transport

Neural Information Processing Systems

We show that it improves inception score and FID calculated by unconditional GAN trained by CIFAR-10, STL-10 and a public pre-trained model of conditional GAN trained by ImageNet.


Reviewer # 1 > It is better to add a section: comparison with related works, to highlight the main contributions

Neural Information Processing Systems

We wish to express our appreciation to the reviewers for their insightful comments on our paper. All responses are reflected in our camera-ready version. Thank you for the proposal. We are sorry for that our writing makes itself hard to follow. Thank you for the important comment.


Learning-Augmented Algorithms with Explicit Predictors

Neural Information Processing Systems

Recent advances in algorithmic design show how to utilize predictions obtained by machine learning models from past and present data. These approaches have demonstrated an enhancement in performance when the predictions are accurate, while also ensuring robustness by providing worst-case guarantees when predictions fail. In this paper we focus on online problems; prior research in this context was focused on a paradigm where the algorithms are oblivious of the predictors' design, treating them as a black box. In contrast, in this work, we unpack the predictor and integrate the learning problem it gives rise for within the algorithmic challenge. In particular we allow the predictor to learn as it receives larger parts of the input, with the ultimate goal of designing online learning algorithms specifically tailored for the algorithmic task at hand. Adopting this perspective, we focus on a number of fundamental problems, including caching and scheduling, which have been well-studied in the black-box setting. For each of the problems, we introduce new algorithms that take advantage of explicit and carefully designed learning rules.


Extracting Training Data from Molecular Pre-trained Models

Neural Information Processing Systems

Graph Neural Networks (GNNs) have significantly advanced the field of drug discovery, enhancing the speed and efficiency of molecular identification. However, training these GNNs demands vast amounts of molecular data, which has spurred the emergence of collaborative model-sharing initiatives. These initiatives facilitate the sharing of molecular pre-trained models among organizations without exposing proprietary training data. Despite the benefits, these molecular pre-trained models may still pose privacy risks. For example, malicious adversaries could perform data extraction attack to recover private training data, thereby threatening commercial secrets and collaborative trust.


Benchmarking the Attribution Quality of Vision Models Robin Hesse 1 Simone Schaub-Meyer 1,2 Stefan Roth Department of Computer Science, Technical University of Darmstadt

Neural Information Processing Systems

Attribution maps are one of the most established tools to explain the functioning of computer vision models. They assign importance scores to input features, indicating how relevant each feature is for the prediction of a deep neural network. While much research has gone into proposing new attribution methods, their proper evaluation remains a difficult challenge. In this work, we propose a novel evaluation protocol that overcomes two fundamental limitations of the widely used incremental-deletion protocol, i.e., the out-of-domain issue and lacking inter-model comparisons. This allows us to evaluate 23 attribution methods and how different design choices of popular vision backbones affect their attribution quality. We find that intrinsically explainable models outperform standard models and that raw attribution values exhibit a higher attribution quality than what is known from previous work. Further, we show consistent changes in the attribution quality when varying the network design, indicating that some standard design choices promote attribution quality.


Audio-Driven Co-Speech Gesture Video Generation (Supplemental Document)

Neural Information Processing Systems

In the supplemental document, we will introduce below contents: 1) proof of Theorem 1 (unique cholesky decomposition theorem) (Sec. L); 13) the licenses of existing assets involved in this paper (Sec. In the main paper, to ease the constraint in the quantization process, we use the unique cholesky decomposition theorem [13] to transform the covariance matrix C to factorial covariance L by theorem: Theorem 1. Because C has the positive determinants, the diagonal entries of L should be non-zero values. The output of the GPT [10] model at the t-th time step is the probability of choosing each codebook entry, where the entry with the largest probability serves as the predicted motion code of the next time step.


Learning from Bad Data via Generation

Neural Information Processing Systems

Bad training data would challenge the learning model from understanding the underlying data-generating scheme, which then increases the difficulty in achieving satisfactory performance on unseen test data. We suppose the real data distribution lies in a distribution set supported by the empirical distribution of bad data. A worst-case formulation can be developed over this distribution set, and then be interpreted as a generation task in an adversarial manner. The connections and differences between GANs and our framework have been thoroughly discussed. We further theoretically show the influence of this generation task on learning from bad data and reveal its connection with a data-dependent regularization. Given different distance measures (e.g., Wasserstein distance or JS divergence) of distributions, we can derive different objective functions for the problem. Experimental results on different kinds of bad training data demonstrate the necessity and effectiveness of the proposed method.


presentation and fix all minor issues in the final version. distributions within the ball of an appropriate radius ɛ (see Eq. (1)), which could also include the unknown real distribution P

Neural Information Processing Systems

We thank reviewers for the constructive comments. First, generators in existing methods tend to fit the empirical distribution. Given a bad training set, their generated data could be worse. Second, these generators often produce "easy" samples Since ɛ is unknown, it is common to take λ as a hyper parameter to be tuned in experiments (e.g. Moreover, the generator could conduct "data augmentation" for the We may thus receive a slightly better result, e.g.


L_DMI: A Novel Information-theoretic Loss Function for Training Deep Nets Robust to Label Noise

Neural Information Processing Systems

Accurately annotating large scale dataset is notoriously expensive both in time and in money. Although acquiring low-quality-annotated dataset can be much cheaper, it often badly damages the performance of trained models when using such dataset without particular treatment. Various methods have been proposed for learning with noisy labels. However, most methods only handle limited kinds of noise patterns, require auxiliary information or steps (e.g., knowing or estimating the noise transition matrix), or lack theoretical justification.