Goto

Collaborating Authors

 Inductive Learning


What are the Desired Characteristics of Calibration Sets? Identifying Correlates on Long Form Scientific Summarization

arXiv.org Artificial Intelligence

Summarization models often generate text that is poorly calibrated to quality metrics because they are trained to maximize the likelihood of a single reference (MLE). To address this, recent work has added a calibration step, which exposes a model to its own ranked outputs to improve relevance or, in a separate line of work, contrasts positive and negative sets to improve faithfulness. While effective, much of this work has focused on how to generate and optimize these sets. Less is known about why one setup is more effective than another. In this work, we uncover the underlying characteristics of effective sets. For each training instance, we form a large, diverse pool of candidates and systematically vary the subsets used for calibration fine-tuning. Each selection strategy targets distinct aspects of the sets, such as lexical diversity or the size of the gap between positive and negatives. On three diverse scientific long-form summarization datasets (spanning biomedical, clinical, and chemical domains), we find, among others, that faithfulness calibration is optimal when the negative sets are extractive and more likely to be generated, whereas for relevance calibration, the metric margin between candidates should be maximized and surprise--the disagreement between model and metric defined candidate rankings--minimized. Code to create, select, and optimize calibration sets is available at https://github.com/griff4692/calibrating-summaries


Subject-based Non-contrastive Self-Supervised Learning for ECG Signal Processing

arXiv.org Artificial Intelligence

Extracting information from the electrocardiography (ECG) signal is an essential step in the design of digital health technologies in cardiology. In recent years, several machine learning (ML) algorithms for automatic extraction of information in ECG have been proposed. Supervised learning methods have successfully been used to identify specific aspects in the signal, like detection of rhythm disorders (arrhythmias). Self-supervised learning (SSL) methods, on the other hand, can be used to extract all the features contained in the data. The model is optimized without any specific goal and learns from the data itself. By adapting state-of-the-art computer vision methodologies to the signal processing domain, a few SSL approaches have been reported recently for ECG processing. However, such SSL methods require either data augmentation or negative pairs, which limits the method to only look for similarities between two ECG inputs, either two versions of the same signal or two signals from the same subject. This leads to models that are very effective at extracting characteristics that are stable in a subject, such as gender or age. But they are not successful at capturing changes within the ECG recording that can explain dynamic aspects, like different arrhythmias or different sleep stages. In this work, we introduce the first SSL method that uses neither data augmentation nor negative pairs for understanding ECG signals, and still, achieves comparable quality representations. As a result, it is possible to design a SSL method that not only captures similarities between two inputs, but also captures dissimilarities for a complete understanding of the data. In addition, a model based on transformer blocks is presented, which produces better results than a model based on convolutional layers (XResNet50) with almost the same number of parameters.


Towards Effective Visual Representations for Partial-Label Learning

arXiv.org Artificial Intelligence

Under partial-label learning (PLL) where, for each training instance, only a set of ambiguous candidate labels containing the unknown true label is accessible, contrastive learning has recently boosted the performance of PLL on vision tasks, attributed to representations learned by contrasting the same/different classes of entities. Without access to true labels, positive points are predicted using pseudo-labels that are inherently noisy, and negative points often require large batches or momentum encoders, resulting in unreliable similarity information and a high computational overhead. In this paper, we rethink a state-of-the-art contrastive PLL method PiCO[24], inspiring the design of a simple framework termed PaPi (Partial-label learning with a guided Prototypical classifier), which demonstrates significant scope for improvement in representation learning, thus contributing to label disambiguation. PaPi guides the optimization of a prototypical classifier by a linear classifier with which they share the same feature encoder, thus explicitly encouraging the representation to reflect visual similarity between categories. It is also technically appealing, as PaPi requires only a few components in PiCO with the opposite direction of guidance, and directly eliminates the contrastive learning module that would introduce noise and consume computational resources. We empirically demonstrate that PaPi significantly outperforms other PLL methods on various image classification tasks.


Progressive Purification for Instance-Dependent Partial Label Learning

arXiv.org Artificial Intelligence

Partial label learning (PLL) aims to train multiclass classifiers from the examples each annotated with a set of candidate labels where a fixed but unknown candidate label is correct. In the last few years, the instance-independent generation process of candidate labels has been extensively studied, on the basis of which many theoretical advances have been made in PLL. Nevertheless, the candidate labels are always instance-dependent in practice and there is no theoretical guarantee that the model trained on the instance-dependent PLL examples can converge to an ideal one. In this paper, a theoretically grounded and practically effective approach named POP, i.e. PrOgressive Purification for instance-dependent partial label learning, is proposed. Specifically, POP updates the learning model and purifies each candidate label set progressively in every epoch. Theoretically, we prove that POP enlarges the region appropriately fast where the model is reliable, and eventually approximates the Bayes optimal classifier with mild assumptions. Technically, POP is flexible with arbitrary PLL losses and could improve the performance of the previous PLL losses in the instance-dependent case. Experiments on the benchmark datasets and the real-world datasets validate the effectiveness of the proposed method.


Measuring Forgetting of Memorized Training Examples

arXiv.org Artificial Intelligence

Machine learning models exhibit two seemingly contradictory phenomena: training data memorization, and various forms of forgetting. In memorization, models overfit specific training examples and become susceptible to privacy attacks. In forgetting, examples which appeared early in training are forgotten by the end. In this work, we connect these phenomena. We propose a technique to measure to what extent models "forget" the specifics of training examples, becoming less susceptible to privacy attacks on examples they have not seen recently. We show that, while non-convex models can memorize data forever in the worst-case, standard image, speech, and language models empirically do forget examples over time. We identify nondeterminism as a potential explanation, showing that deterministically trained models do not forget. Our results suggest that examples seen early when training with extremely large datasets - for instance those examples used to pre-train a model - may observe privacy benefits at the expense of examples seen later.


Weakly Supervised Learning for Analyzing Political Campaigns on Facebook

arXiv.org Artificial Intelligence

Social media platforms are currently the main channel for political messaging, allowing politicians to target specific demographics and adapt based on their reactions. However, making this communication transparent is challenging, as the messaging is tightly coupled with its intended audience and often echoed by multiple stakeholders interested in advancing specific policies. Our goal in this paper is to take a first step towards understanding these highly decentralized settings. We propose a weakly supervised approach to identify the stance and issue of political ads on Facebook and analyze how political campaigns use some kind of demographic targeting by location, gender, or age. Furthermore, we analyze the temporal dynamics of the political ads on election polls.


Adaptive Domain Generalization for Digital Pathology Images

arXiv.org Artificial Intelligence

In AI-based histopathology, domain shifts are common and well-studied. However, this research focuses on stain and scanner variations, which do not show the full picture -- shifts may be combinations of other shifts, or "invisible" shifts that are not obvious but still damage performance of machine learning models. Furthermore, it is important for models to generalize to these shifts without expensive or scarce annotations, especially in the histopathology space and if wanting to deploy models on a larger scale. Thus, there is a need for "reactive" domain generalization techniques: ones that adapt to domain shifts at test-time rather than requiring predictions of or examples of the shifts at training time. We conduct a literature review and introduce techniques that react to domain shifts rather than requiring a prediction of them in advance. We investigate test time training, a technique for domain generalization that adapts model parameters at test-time through optimization of a secondary self-supervised task.


Cone: Unsupervised Contrastive Opinion Extraction

arXiv.org Artificial Intelligence

Contrastive opinion extraction aims to extract a structured summary or key points organised as positive and negative viewpoints towards a common aspect or topic. Most recent works for unsupervised key point extraction is largely built on sentence clustering or opinion summarisation based on the popularity of opinions expressed in text. However, these methods tend to generate aspect clusters with incoherent sentences, conflicting viewpoints, redundant aspects. To address these problems, we propose a novel unsupervised Contrastive OpinioN Extraction model, called Cone, which learns disentangled latent aspect and sentiment representations based on pseudo aspect and sentiment labels by combining contrastive learning with iterative aspect/sentiment clustering refinement. Apart from being able to extract contrastive opinions, it is also able to quantify the relative popularity of aspects and their associated sentiment distributions. The model has been evaluated on both a hotel review dataset and a Twitter dataset about COVID vaccines. The results show that despite using no label supervision or aspect-denoted seed words, Cone outperforms a number of competitive baselines on contrastive opinion extraction. The results of Cone can be used to offer a better recommendation of products and services online.


CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations

arXiv.org Artificial Intelligence

Geo-tagged images are publicly available in large quantities, whereas labels such as object classes are rather scarce and expensive to collect. Meanwhile, contrastive learning has achieved tremendous success in various natural image and language tasks with limited labeled data. However, existing methods fail to fully leverage geospatial information, which can be paramount to distinguishing objects that are visually similar. To directly leverage the abundant geospatial information associated with images in pre-training, fine-tuning, and inference stages, we present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images. We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images, which can be transferred to downstream supervised tasks such as image classification. Experiments show that CSP can improve model performance on both iNat2018 and fMoW datasets. Especially, on iNat2018, CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.


Unlocking the Power of Open Set : A New Perspective for Open-set Noisy Label Learning

arXiv.org Artificial Intelligence

Learning from noisy data has attracted much attention, where most methods focus on closed-set label noise. However, a more common scenario in the real world is the presence of both open-set and closed-set noise. Existing methods typically identify and handle these two types of label noise separately by designing a specific strategy for each type. However, in many real-world scenarios, it would be challenging to identify open-set examples, especially when the dataset has been severely corrupted. Unlike the previous works, we explore how models behave when faced open-set examples, and find that a part of open-set examples gradually get integrated into certain known classes, which is beneficial for the seperation among known classes. Motivated by the phenomenon, in this paper, we propose a novel two-step contrastive learning method called CECL, which aims to deal with both types of label noise by exploiting the useful information of open-set examples. Specifically, we incorporate some open-set examples into closed-set classes to enhance performance while treating others as delimiters to improve representative ability. Extensive experiments on synthetic and real-world datasets with diverse label noise demonstrate that CECL can outperform state-of-the-art methods.