Goto

Collaborating Authors

 Roy, Subhrajit


Safety Alignment Should Be Made More Than Just a Few Tokens Deep

arXiv.org Artificial Intelligence

The safety alignment of current Large Language Models (LLMs) is vulnerable. Relatively simple attacks, or even benign fine-tuning, can jailbreak aligned models. We argue that many of these vulnerabilities are related to a shared underlying issue: safety alignment can take shortcuts, wherein the alignment adapts a model's generative distribution primarily over only its very first few output tokens. We refer to this issue as shallow safety alignment. In this paper, we present case studies to explain why shallow safety alignment can exist and provide evidence that current aligned LLMs are subject to this issue. We also show how these findings help explain multiple recently discovered vulnerabilities in LLMs, including the susceptibility to adversarial suffix attacks, prefilling attacks, decoding parameter attacks, and fine-tuning attacks. Importantly, we discuss how this consolidated notion of shallow safety alignment sheds light on promising research directions for mitigating these vulnerabilities. For instance, we show that deepening the safety alignment beyond just the first few tokens can often meaningfully improve robustness against some common exploits. Finally, we design a regularized finetuning objective that makes the safety alignment more persistent against fine-tuning attacks by constraining updates on initial tokens. Overall, we advocate that future safety alignment should be made more than just a few tokens deep.


Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering

arXiv.org Artificial Intelligence

Prompting and in-context learning (ICL) have become efficient learning paradigms for large language models (LLMs). However, LLMs suffer from prompt brittleness and various bias factors in the prompt, including but not limited to the formatting, the choice verbalizers, and the ICL examples. To address this problem that results in unexpected performance degradation, calibration methods have been developed to mitigate the effects of these biases while recovering LLM performance. In this work, we first conduct a systematic analysis of the existing calibration methods, where we both provide a unified view and reveal the failure cases. Inspired by these analyses, we propose Batch Calibration (BC), a simple yet intuitive method that controls the contextual bias from the batched input, unifies various prior approaches, and effectively addresses the aforementioned issues. BC is zero-shot, inference-only, and incurs negligible additional costs. In the few-shot setup, we further extend BC to allow it to learn the contextual bias from labeled data. We validate the effectiveness of BC with PaLM 2-(S, M, L) and CLIP models and demonstrate state-of-the-art performance over previous calibration baselines across more than 10 natural language understanding and image classification tasks. Prompting large language models (LLMs) (Chowdhery et al., 2022; Anil et al., 2023) has become Extensive research has been devoted to mitigating these biases, which we explicitly refer to the a-priori propensity of LLMs to predict certain classes over others unfairly.


Gemini: A Family of Highly Capable Multimodal Models

arXiv.org Artificial Intelligence

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of Gemini models in cross-modal reasoning and language understanding will enable a wide variety of use cases and we discuss our approach toward deploying them responsibly to users.


Benchmarking Continuous Time Models for Predicting Multiple Sclerosis Progression

arXiv.org Artificial Intelligence

Multiple sclerosis is a disease that affects the brain and spinal cord, it can lead to severe disability and has no known cure. The majority of prior work in machine learning for multiple sclerosis has been centered around using Magnetic Resonance Imaging scans or laboratory tests; these modalities are both expensive to acquire and can be unreliable. In a recent paper it was shown that disease progression can be predicted effectively using performance outcome measures and demographic data. In our work we build on this to investigate the modeling side, using continuous time models to predict progression. We benchmark four continuous time models using a publicly available multiple sclerosis dataset. We find that the best continuous model is often able to outperform the best benchmarked discrete time model. We also carry out an extensive ablation to discover the sources of performance gains, we find that standardizing existing features leads to a larger performance increase than interpolating missing features.


STUDY: Socially Aware Temporally Causal Decoder Recommender Systems

arXiv.org Artificial Intelligence

Recommender systems are widely used to help people find items that are tailored to their interests. These interests are often influenced by social networks, making it important to use social network information effectively in recommender systems. This is especially true for demographic groups with interests that differ from the majority. This paper introduces STUDY, a Socially-aware Temporally caUsal Decoder recommender sYstem. STUDY introduces a new socially-aware recommender system architecture that is significantly more efficient to learn and train than existing methods. STUDY performs joint inference over socially connected groups in a single forward pass of a modified transformer decoder network. We demonstrate the benefits of STUDY in the recommendation of books for students who are dyslexic, or struggling readers. Dyslexic students often have difficulty engaging with reading material, making it critical to recommend books that are tailored to their interests. We worked with our non-profit partner Learning Ally to evaluate STUDY on a dataset of struggling readers. STUDY was able to generate recommendations that more accurately predicted student engagement, when compared with existing methods.


Maintaining fairness across distribution shift: do we have viable solutions for real-world applications?

arXiv.org Machine Learning

Fairness and robustness are often considered as orthogonal dimensions when evaluating machine learning models. However, recent work has revealed interactions between fairness and robustness, showing that fairness properties are not necessarily maintained under distribution shift. In healthcare settings, this can result in e.g. a model that performs fairly according to a selected metric in "hospital A" showing unfairness when deployed in "hospital B". While a nascent field has emerged to develop provable fair and robust models, it typically relies on strong assumptions about the shift, limiting its impact for real-world applications. In this work, we explore the settings in which recently proposed mitigation strategies are applicable by referring to a causal framing. Using examples of predictive models in dermatology and electronic health records, we show that real-world applications are complex and often invalidate the assumptions of such methods. Our work hence highlights technical, practical, and engineering gaps that prevent the development of robustly fair machine learning models for real-world applications. Finally, we discuss potential remedies at each step of the machine learning pipeline.


Machine Learning for removing EEG artifacts: Setting the benchmark

arXiv.org Machine Learning

Electroencephalograms (EEG) are often contaminated by artifacts which make interpreting them more challenging for clinicians. Hence, automated artifact recognition systems have the potential to aid the clinical workflow. In this abstract, we share the first results on applying various machine learning algorithms to the recently released world's largest open-source artifact recognition dataset. We envision that these results will serve as a benchmark for researchers who might work with this dataset in future. Introduction Removal of artifacts from electroencephalogram (EEG) is a necessary step in analyzing EEG signals since artifacts can lead to severe misinterpretation of these signals.


A semi-supervised deep learning algorithm for abnormal EEG identification

arXiv.org Machine Learning

Systems that can automatically analyze EEG signals can aid neurologists by reducing heavy workload and delays. However, such systems need to be first trained using a labeled dataset. While large corpuses of EEG data exist, a fraction of them are labeled. Hand-labeling data increases workload for the very neurologists we try to aid. This paper proposes a semi-supervised learning algorithm that can not only extract meaningful information from large unlabeled EEG datasets but also perform task-specific learning on labeled datasets as small as 5 examples.


SeizureNet: A Deep Convolutional Neural Network for Accurate Seizure Type Classification and Seizure Detection

arXiv.org Machine Learning

Automatic epileptic seizure analysis is important because the differentiation of neural patterns among different patients can be used to classify people with specific types of epilepsy. This could enable more efficient management of the disease. Automatic seizure type classification using clinical electroencephalograms (EEGs) is challenging due to factors such as low signal to noise ratios, signal artefacts, high variance in the seizure semiology among individual epileptic patients, and limited clinical data constraints. To overcome these challenges, in this paper, we present a deep learning based framework which uses a Convolutional Neural Network (CNN) with dense connections and learns highly robust features at different spatial and temporal resolutions of the EEG data spectrum for accurate cross-patient seizure type classification. We evaluate our framework for seizure type classification and seizure detection on the recently released TUH EEG Seizure Corpus, where our framework achieves overall weighted f 1 scores of up to 0.90 and 0.88, thereby setting new benchmarks on the dataset.


Machine Learning for Seizure Type Classification: Setting the benchmark

arXiv.org Machine Learning

Accurate classification of seizure types plays a crucial role in the treatment and disease management of epileptic patients. Epileptic seizure type not only impacts on the choice of drugs but also on the range of activities a patient can safely engage in. With recent advances being made towards artificial intelligence enabled automatic seizure detection, the next frontier is the automatic classification of seizure types. On that note, in this paper, we undertake the first study to explore the application of machine learning algorithms for multi-class seizure type classification. We used the recently released TUH EEG Seizure Corpus and conducted a thorough search space exploration to evaluate the performance of a combination of various pre-processing techniques, machine learning algorithms, and corresponding hyperparameters on this task. We show that our algorithms can reach a weighted F1 score of up to 0.907 thereby setting the first benchmark for scalp EEG based multi-class seizure type classification.