Goto

Collaborating Authors

 Banff


One Positive Label is Sufficient: Single-Positive Multi-Label Learning with Label Enhancement

arXiv.org Artificial Intelligence

Multi-label learning (MLL) learns from the examples each associated with multiple labels simultaneously, where the high cost of annotating all relevant labels for each training example is challenging for real-world applications. To cope with the challenge, we investigate single-positive multi-label learning (SPMLL) where each example is annotated with only one relevant label, and show that one can successfully learn a theoretically grounded multi-label classifier for the problem. In this paper, a novel SPMLL method named SMILE, i.e., Single-positive MultI-label learning with Label Enhancement, is proposed. Specifically, an unbiased risk estimator is derived, which could be guaranteed to approximately converge to the optimal risk minimizer of fully supervised learning and shows that one positive label of each instance is sufficient to train the predictive model. Then, the corresponding empirical risk estimator is established via recovering the latent soft label as a label enhancement process, where the posterior density of the latent soft labels is approximate to the variational Beta density parameterized by an inference model. Experiments on benchmark datasets validate the effectiveness of the proposed method.


Adversarial Robustness of Deep Neural Networks: A Survey from a Formal Verification Perspective

arXiv.org Artificial Intelligence

Neural networks have been widely applied in security applications such as spam and phishing detection, intrusion prevention, and malware detection. This black-box method, however, often has uncertainty and poor explainability in applications. Furthermore, neural networks themselves are often vulnerable to adversarial attacks. For those reasons, there is a high demand for trustworthy and rigorous methods to verify the robustness of neural network models. Adversarial robustness, which concerns the reliability of a neural network when dealing with maliciously manipulated inputs, is one of the hottest topics in security and machine learning. In this work, we survey existing literature in adversarial robustness verification for neural networks and collect 39 diversified research works across machine learning, security, and software engineering domains. We systematically analyze their approaches, including how robustness is formulated, what verification techniques are used, and the strengths and limitations of each technique. We provide a taxonomy from a formal verification perspective for a comprehensive understanding of this topic. We classify the existing techniques based on property specification, problem reduction, and reasoning strategies. We also demonstrate representative techniques that have been applied in existing studies with a sample model. Finally, we discuss open questions for future research.


Weakly supervised causal representation learning

arXiv.org Artificial Intelligence

Learning high-level causal representations together with a causal model from unstructured low-level data such as pixels is impossible from observational data alone. We prove under mild assumptions that this representation is however identifiable in a weakly supervised setting. This involves a dataset with paired samples before and after random, unknown interventions, but no further labels. We then introduce implicit latent causal models, variational autoencoders that represent causal variables and causal structure without having to optimize an explicit discrete graph structure. On simple image data, including a novel dataset of simulated robotic manipulation, we demonstrate that such models can reliably identify the causal structure and disentangle causal variables.


XPrompt: Exploring the Extreme of Prompt Tuning

arXiv.org Artificial Intelligence

Prompt tuning learns soft prompts to condition frozen Pre-trained Language Models (PLMs) for performing downstream tasks in a parameter-efficient manner. While prompt tuning has gradually reached the performance level of fine-tuning as the model scale increases, there is still a large performance gap between prompt tuning and fine-tuning for models of moderate and small scales (typically less than 11B parameters). In this paper, we empirically show that the trained prompt tokens can have a negative impact on a downstream task and thus degrade its performance. To bridge the gap, we propose a novel Prompt tuning model with an eXtremely small scale (XPrompt) under the regime of lottery tickets hypothesis. Specifically, XPrompt eliminates the negative prompt tokens at different granularity levels through a hierarchical structured pruning, yielding a more parameter-efficient prompt yet with a competitive performance. Comprehensive experiments are carried out on SuperGLUE tasks, and the extensive results indicate that XPrompt is able to close the performance gap at smaller model scales.


A survey of Identification and mitigation of Machine Learning algorithmic biases in Image Analysis

arXiv.org Artificial Intelligence

The ubiquity of Machine Learning (ML) models, and more specifically deep neural network (NN) models, in all sorts of applications has become undeniable in recent years. From classifying images [1, 2, 3], detecting objects [4, 1] and performing semantic segmentation [5, 4] to translating from one human language to another [6] and doing sentiment analysis [7], the advances in different subfields of ML can be attributed mostly to the explosion of computing power and their ability to speed up the training process of artificial NNs. Most famously, AlexNet [8] allowed for an impressive jump in performance in the challenging ILSVRC2012 image classification dataset [1], also known as ImageNet, permanently cementing deep convolutional NN (CNN) architectures in the field of computer vision. Since then, architectures have gotten more refined [9, 10], training procedures have gotten increasingly more complex [11], and their performance and robustness have greatly improved as a consequence. Namely, the success of these deep CNN models is related to their ability to treat high-dimensional and complex data such as images or natural language. The impressive performance of NNs for machine learning tasks can be explained by the ability of their flexible architecture to capture meaningful information on various kinds of complex data and the fact that they are potentially composed of millions of parameters. However, this poses a major challenge: deciphering the reasoning behind the model's predictions. For instance, typical NN architectures for classification or regression problems incrementally transform the representation of the input data in the so-called latent space (or feature space) and then use this transformed representation to make their predictions, as summarized in Figure 1. Each step of this incremental data processing pipeline (or feature extraction chain) is carried out by a so-called layer, which is mathematically a non-linear function (blue rectangle in Figure 1).


A Quantitative Geometric Approach to Neural-Network Smoothness

arXiv.org Artificial Intelligence

Fast and precise Lipschitz constant estimation of neural networks is an important task for deep learning. Researchers have recently found an intrinsic trade-off between the accuracy and smoothness of neural networks, so training a network with a loose Lipschitz constant estimation imposes a strong regularization and can hurt the model accuracy significantly. In this work, we provide a unified theoretical framework, a quantitative geometric approach, to address the Lipschitz constant estimation. By adopting this framework, we can immediately obtain several theoretical results, including the computational hardness of Lipschitz constant estimation and its approximability. Furthermore, the quantitative geometric perspective can also provide some insights into recent empirical observations that techniques for one norm do not usually transfer to another one. We also implement the algorithms induced from this quantitative geometric approach in a tool GeoLIP. These algorithms are based on semidefinite programming (SDP). Our empirical evaluation demonstrates that GeoLIP is more scalable and precise than existing tools on Lipschitz constant estimation for $\ell_\infty$-perturbations. Furthermore, we also show its intricate relations with other recent SDP-based techniques, both theoretically and empirically. We believe that this unified quantitative geometric perspective can bring new insights and theoretical tools to the investigation of neural-network smoothness and robustness.


CoopHash: Cooperative Learning of Multipurpose Descriptor and Contrastive Pair Generator via Variational MCMC Teaching for Supervised Image Hashing

arXiv.org Artificial Intelligence

Leveraging supervised information can lead to superior retrieval performance in the image hashing domain but the performance degrades significantly without enough labeled data. One effective solution to boost the performance is to employ generative models, such as Generative Adversarial Networks (GANs), to generate synthetic data in an image hashing model. However, GAN-based methods are difficult to train and suffer from mode collapse issue, which prevents the hashing approaches from jointly training the generative models and the hash functions. This limitation results in sub-optimal retrieval performance. To overcome this limitation, we propose a novel framework, the generative cooperative hashing network (CoopHash), which is based on the energy-based cooperative learning. CoopHash jointly learns a powerful generative representation of the data and a robust hash function. CoopHash has two components: a top-down contrastive pair generator that synthesizes contrastive images and a bottom-up multipurpose descriptor that simultaneously represents the images from multiple perspectives, including probability density, hash code, latent code, and category. The two components are jointly learned via a novel likelihood-based cooperative learning scheme. We conduct experiments on several real-world datasets and show that the proposed method outperforms the competing hashing supervised methods, achieving up to 10% relative improvement over the current state-of-the-art supervised hashing methods, and exhibits a significantly better performance in out-of-distribution retrieval.


VICE: Variational Interpretable Concept Embeddings

arXiv.org Artificial Intelligence

A central goal in the cognitive sciences is the development of numerical models for mental representations of object concepts. This paper introduces Variational Interpretable Concept Embeddings (VICE), an approximate Bayesian method for embedding object concepts in a vector space using data collected from humans in a triplet odd-one-out task. VICE uses variational inference to obtain sparse, non-negative representations of object concepts with uncertainty estimates for the embedding values. These estimates are used to automatically select the dimensions that best explain the data. We derive a PAC learning bound for VICE that can be used to estimate generalization performance or determine a sufficient sample size for experimental design. VICE rivals or outperforms its predecessor, SPoSE, at predicting human behavior in the triplet odd-one-out task. Furthermore, VICE's object representations are more reproducible and consistent across random initializations, highlighting the unique advantage of using VICE for deriving interpretable embeddings from human behavior.


Time Will Change Things: An Empirical Study on Dynamic Language Understanding in Social Media Classification

arXiv.org Artificial Intelligence

Language features are ever-evolving in the real-world social media environment. Many trained models in natural language understanding (NLU), ineffective in semantic inference for unseen features, might consequently struggle with the deteriorating performance in dynamicity. To address this challenge, we empirically study social media NLU in a dynamic setup, where models are trained on the past data and test on the future. It better reflects the realistic practice compared to the commonly-adopted static setup of random data split. To further analyze model adaption to the dynamicity, we explore the usefulness of leveraging some unlabeled data created after a model is trained. The performance of unsupervised domain adaption baselines based on auto-encoding and pseudo-labeling and a joint framework coupling them both are examined in the experiments. Substantial results on four social media tasks imply the universally negative effects of evolving environments over classification accuracy, while auto-encoding and pseudo-labeling collaboratively show the best robustness in dynamicity.


The Influence of Explainable Artificial Intelligence: Nudging Behaviour or Boosting Capability?

arXiv.org Artificial Intelligence

This article aims to provide a theoretical account and corresponding paradigm for analysing how explainable artificial intelligence (XAI) influences people's behaviour and cognition. It uses insights from research on behaviour change. Two notable frameworks for thinking about behaviour change techniques are nudges - aimed at influencing behaviour - and boosts - aimed at fostering capability. It proposes that local and concept-based explanations are more adjacent to nudges, while global and counterfactual explanations are more adjacent to boosts. It outlines a method for measuring XAI influence and argues for the benefits of understanding it for optimal, safe and ethical human-AI collaboration.