Goto

Collaborating Authors

 Supervised Learning


A negative case analysis of visual grounding methods for VQA

arXiv.org Artificial Intelligence

Existing Visual Question Answering (VQA) methods tend to exploit dataset biases and spurious statistical correlations, instead of producing right answers for the right reasons. To address this issue, recent bias mitigation methods for VQA propose to incorporate visual cues (e.g., human attention maps) to better ground the VQA models, showcasing impressive gains. However, we show that the performance improvements are not a result of improved visual grounding, but a regularization effect which prevents over-fitting to linguistic priors. For instance, we find that it is not actually necessary to provide proper, human-based cues; random, insensible cues also result in similar improvements. Based on this observation, we propose a simpler regularization scheme that does not require any external annotations and yet achieves near state-of-the-art performance on VQA-CPv2.


Towards Exploiting Implicit Human Feedback for Improving RDF2vec Embeddings

arXiv.org Artificial Intelligence

RDF2vec is a technique for creating vector space embeddings from an RDF knowledge graph, i.e., representing each entity in the graph as a vector. It first creates sequences of nodes by performing random walks on the graph. In a second step, those sequences are processed by the word2vec algorithm for creating the actual embeddings. In this paper, we explore the use of external edge weights for guiding the random walks. As edge weights, transition probabilities between pages in Wikipedia are used as a proxy for the human feedback for the importance of an edge. We show that in some scenarios, RDF2vec utilizing those transition probabilities can outperform both RDF2vec based on random walks as well as the usage of graph internal edge weights.


Locality Preserving Loss to Align Vector Spaces

arXiv.org Machine Learning

We present a locality preserving loss (LPL)that improves the alignment between vector space representations (i.e., word or sentence embeddings) while separating (increasing distance between) uncorrelated representations as compared to the standard method that minimizes the mean squared error (MSE) only. The locality preserving loss optimizes the projection by maintaining the local neighborhood of embeddings that are found in the source, in the target domain as well. This reduces the overall size of the dataset required to the train model. We argue that vector space alignment (with MSE and LPL losses) acts as a regularizer in certain language-based classification tasks, leading to better accuracy than the base-line, especially when the size of the training set is small. We validate the effectiveness ofLPL on a cross-lingual word alignment task, a natural language inference task, and a multi-lingual inference task.


Embedding Java Classes with code2vec: Improvements from Variable Obfuscation

arXiv.org Machine Learning

Automatic source code analysis in key areas of software engineering, such as code security, can benefit from Machine Learning (ML). However, many standard ML approaches require a numeric representation of data and cannot be applied directly to source code. Thus, to enable ML, we need to embed source code into numeric feature vectors while maintaining the semantics of the code as much as possible. code2vec is a recently released embedding approach that uses the proxy task of method name prediction to map Java methods to feature vectors. However, experimentation with code2vec shows that it learns to rely on variable names for prediction, causing it to be easily fooled by typos or adversarial attacks. Moreover, it is only able to embed individual Java methods and cannot embed an entire collection of methods such as those present in a typical Java class, making it difficult to perform predictions at the class level (e.g., for the identification of malicious Java classes). Both shortcomings are addressed in the research presented in this paper. We investigate the effect of obfuscating variable names during the training of a code2vec model to force it to rely on the structure of the code rather than specific names and consider a simple approach to creating class-level embeddings by aggregating sets of method embeddings. Our results, obtained on a challenging new collection of source-code classification problems, indicate that obfuscating variable names produces an embedding model that is both impervious to variable naming and more accurately reflects code semantics. The datasets, models, and code are shared for further ML research on source code.


Nonparametric Contextual Bandits in Metric Spaces with Unknown Metric

Neural Information Processing Systems

Consider a nonparametric contextual multi-arm bandit problem where each arm $a \in [K]$ is associated to a nonparametric reward function $f_a: [0,1] \to \mathbb{R}$ mapping from contexts to the expected reward. Suppose that there is a large set of arms, yet there is a simple but unknown structure amongst the arm reward functions, e.g. We present a novel algorithm which learns data-driven similarities amongst the arms, in order to implement adaptive partitioning of the context-arm space for more efficient learning. We provide regret bounds along with simulations that highlight the algorithm's dependence on the local geometry of the reward functions. Papers published at the Neural Information Processing Systems Conference.


Search-Guided, Lightly-Supervised Training of Structured Prediction Energy Networks

Neural Information Processing Systems

In structured output prediction tasks, labeling ground-truth training output is often expensive. However, for many tasks, even when the true output is unknown, we can evaluate predictions using a scalar reward function, which may be easily assembled from human knowledge or non-differentiable pipelines. But searching through the entire output space to find the best output with respect to this reward function is typically intractable. In this paper, we instead use efficient truncated randomized search in this reward function to train structured prediction energy networks (SPENs), which provide efficient test-time inference using gradient-based search on a smooth, learned representation of the score landscape, and have previously yielded state-of-the-art results in structured prediction. In particular, this truncated randomized search in the reward function yields previously unknown local improvements, providing effective supervision to SPENs, avoiding their traditional need for labeled training data. Papers published at the Neural Information Processing Systems Conference.


Structured Prediction with Projection Oracles

Neural Information Processing Systems

We propose in this paper a general framework for deriving loss functions for structured prediction. In our framework, the user chooses a convex set including the output space and provides an oracle for projecting onto that set. Given that oracle, our framework automatically generates a corresponding convex and smooth loss function. As we show, adding a projection as output layer provably makes the loss smaller. We identify the marginal polytope, the output space's convex hull, as the best convex set on which to project. However, because the projection onto the marginal polytope can sometimes be expensive to compute, we allow to use any convex superset instead, with potentially cheaper-to-compute projection.


Graph Structured Prediction Energy Networks

Neural Information Processing Systems

For joint inference over multiple variables, a variety of structured prediction techniques have been developed to model correlations among variables and thereby improve predictions. However, many classical approaches suffer from one of two primary drawbacks: they either lack the ability to model high-order correlations among variables while maintaining computationally tractable inference, or they do not allow to explicitly model known correlations. To address this shortcoming, we introduce'Graph Structured Prediction Energy Networks,' for which we develop inference techniques that allow to both model explicit local and implicit higher-order correlations while maintaining tractability of inference. We apply the proposed method to tasks from the natural language processing and computer vision domain and demonstrate its general utility. Papers published at the Neural Information Processing Systems Conference.


Bipartite Link Prediction based on Topological Features via 2-hop Path

arXiv.org Machine Learning

A variety of real-world systems can be modeled as bipartite networks. One of the most powerful and simple link prediction methods is Linear-Graph Autoencoder(LGAE) which has promising performance on challenging tasks such as link prediction and node clustering. LGAE relies on simple linear model w.r.t. the adjacency matrix of the graph to learn vector space representations of nodes. In this paper, we consider the case of bipartite link predictions where node attributes are unavailable. When using LGAE, we propose to multiply the reconstructed adjacency matrix with a symmetrically normalized training adjacency matrix. As a result, 2-hop paths are formed which we use as the predicted adjacency matrix to evaluate the performance of our model. Experimental results on both synthetic and real-world dataset show our approach consistently outperforms Graph Autoencoder and Linear Graph Autoencoder model in 10 out of 12 bipartite dataset and reaches competitive performances in 2 other bipartite dataset.


Localized Structured Prediction

Neural Information Processing Systems

Key to structured prediction is exploiting the problem's structure to simplify the learning process. A major challenge arises when data exhibit a local structure (i.e., are made by parts'') that can be leveraged to better approximate the relation between (parts of) the input and (parts of) the output. Recent literature on signal processing, and in particular computer vision, shows that capturing these aspects is indeed essential to achieve state-of-the-art performance. However, in this context algorithms are typically derived on a case-by-case basis. In this work we propose the first theoretical framework to deal with part-based data from a general perspective and study a novel method within the setting of statistical learning theory.