South America
Statistical Deficiency for Task Inclusion Estimation
Fosse, Loïc, Béchet, Frédéric, Favre, Benoît, Damnati, Géraldine, Lecorvé, Gwénolé, Darrin, Maxime, Formont, Philippe, Piantanida, Pablo
While we theoretically show for which annotated datasets exist, and it is commonly the shortcomings of naively measuring cross-task accepted that the summarization task, at performance by directly applying each model to least in the news domain, requires NER skills to each other task, the contributions of the paper are be performed effectively. As a consequence, studying threefold: generated summaries from the perspective of A theoretical framework for task definition retained named entities is a relevant evaluation and inclusion. Based on information concepts angle (Pagnoni et al., 2021; Berezin and Batura, and theory, we propose a clear definition of a task 2022; Akani et al., 2023). According to this principle, and candidate notions of inclusion (independent a more general hypothesis is that multi-task of the notion of model).
DataMan: Data Manager for Pre-training Large Language Models
Peng, Ru, Yang, Kexin, Zeng, Yawen, Lin, Junyang, Liu, Dayiheng, Zhao, Junbo
The performance emergence of large language models (LLMs) driven by data scaling laws makes the selection of pre-training data increasingly important. However, existing methods rely on limited heuristics and human intuition, lacking comprehensive and clear guidelines. To address this, we are inspired by ``reverse thinking'' -- prompting LLMs to self-identify which criteria benefit its performance. As its pre-training capabilities are related to perplexity (PPL), we derive 14 quality criteria from the causes of text perplexity anomalies and introduce 15 common application domains to support domain mixing. In this paper, we train a Data Manager (DataMan) to learn quality ratings and domain recognition from pointwise rating, and use it to annotate a 447B token pre-training corpus with 14 quality ratings and domain type. Our experiments validate our approach, using DataMan to select 30B tokens to train a 1.3B-parameter language model, demonstrating significant improvements in in-context learning (ICL), perplexity, and instruction-following ability over the state-of-the-art baseline. The best-performing model, based on the Overall Score l=5 surpasses a model trained with 50% more data using uniform sampling. We continue pre-training with high-rated, domain-specific data annotated by DataMan to enhance domain-specific ICL performance and thus verify DataMan's domain mixing ability. Our findings emphasize the importance of quality ranking, the complementary nature of quality criteria, and their low correlation with perplexity, analyzing misalignment between PPL and ICL performance. We also thoroughly analyzed our pre-training dataset, examining its composition, the distribution of quality ratings, and the original document sources.
The 13 drugs and supplements that could slow brain ageing
Seven genes have been linked to particularly fast ageing of the brain – but 13 drugs and supplements might reduce their effects. The activity of many genes contributes to the difference between our actual age and the biological age of our brains, defined by how old our cells indicate we are, which creates what is known as a brain age gap. Restoring the brain's mitochondria could slow ageing and end dementia To find genes that accelerate brain ageing and widen this gap, Zhengxing Huang at Zhejiang University in China and his colleagues trained a deep-learning model called 3D-ViT on some medical records and used others to check it gave accurate responses. They then used it to analyse data from nearly 39,000 people who had health, genetic and lifestyle information, along with biological samples, stored in the UK Biobank. These participants were 64 years old, on average, and about half were women.
Discovering Influential Neuron Path in Vision Transformers
Wang, Yifan, Liu, Yifei, Shi, Yingdong, Li, Changming, Pang, Anqi, Yang, Sibei, Yu, Jingyi, Ren, Kan
Vision Transformer models exhibit immense power yet remain opaque to human understanding, posing challenges and risks for practical applications. While prior research has attempted to demystify these models through input attribution and neuron role analysis, there's been a notable gap in considering layer-level information and the holistic path of information flow across layers. In this paper, we investigate the significance of influential neuron paths within vision Transformers, which is a path of neurons from the model input to output that impacts the model inference most significantly. We first propose a joint influence measure to assess the contribution of a set of neurons to the model outcome. And we further provide a layer-progressive neuron locating approach that efficiently selects the most influential neuron at each layer trying to discover the crucial neuron path from input to output within the target model. Our experiments demonstrate the superiority of our method finding the most influential neuron path along which the information flows, over the existing baseline solutions. Additionally, the neuron paths have illustrated that vision Transformers exhibit some specific inner working mechanism for processing the visual information within the same image category. We further analyze the key effects of these neurons on the image classification task, showcasing that the found neuron paths have already preserved the model capability on downstream tasks, which may also shed some lights on real-world applications like model pruning. Transformer (V aswani et al., 2017) models in the vision domain, such as supervised Vision Transformers (Dosovitskiy et al., 2021) (ViT) or self-supervised pretrained models (He et al., 2022; Oquab et al., 2023), have showcased remarkable performance in various real-world tasks like image classification (Dosovitskiy et al., 2021) and image synthesis (Peebles & Xie, 2023). However, the inner workings of these vision Transformer models remain elusive, despite their impressive achievements. Understanding the internal mechanisms of vision models is crucial for both research and practical applications. When confronted with the model decision outputs, one may raise some questions that, how is the vision Transformer model processing the input information by layer, and which part of the model is significant to derive the final outcome? Unraveling the synergy within these models is essential for comprehending machine learning systems.
Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models
Wehner, Jan, Abdelnabi, Sahar, Tan, Daniel, Krueger, David, Fritz, Mario
Representation Engineering (RepE) is a novel paradigm for controlling the behavior of LLMs. Unlike traditional approaches that modify inputs or fine-tune the model, RepE directly manipulates the model's internal representations. As a result, it may offer more effective, interpretable, data-efficient, and flexible control over models' behavior. We present the first comprehensive survey of RepE for LLMs, reviewing the rapidly growing literature to address key questions: What RepE methods exist and how do they differ? For what concepts and problems has RepE been applied? What are the strengths and weaknesses of RepE compared to other methods? To answer these, we propose a unified framework describing RepE as a pipeline comprising representation identification, operationalization, and control. We posit that while RepE methods offer significant potential, challenges remain, including managing multiple concepts, ensuring reliability, and preserving models' performance. Towards improving RepE, we identify opportunities for experimental and methodological improvements and construct a guide for best practices.
Single-Qudit Quantum Neural Networks for Multiclass Classification
Souza, Leandro C., Portugal, Renato
This paper proposes a single-qudit quantum neural network for multiclass classification, by using the enhanced representational capacity of high-dimensional qudit states. Our design employs an $d$-dimensional unitary operator, where $d$ corresponds to the number of classes, constructed using the Cayley transform of a skew-symmetric matrix, to efficiently encode and process class information. This architecture enables a direct mapping between class labels and quantum measurement outcomes, reducing circuit depth and computational overhead. To optimize network parameters, we introduce a hybrid training approach that combines an extended activation function -- derived from a truncated multivariable Taylor series expansion -- with support vector machine optimization for weight determination. We evaluate our model on the MNIST and EMNIST datasets, demonstrating competitive accuracy while maintaining a compact single-qudit quantum circuit. Our findings highlight the potential of qudit-based QNNs as scalable alternatives to classical deep learning models, particularly for multiclass classification. However, practical implementation remains constrained by current quantum hardware limitations. This research advances quantum machine learning by demonstrating the feasibility of higher-dimensional quantum systems for efficient learning tasks.
Refining Filter Global Feature Weighting for Fully-Unsupervised Clustering
In the context of unsupervised learning, effective clustering plays a vital role in revealing patterns and insights from unlabeled data. However, the success of clustering algorithms often depends on the relevance and contribution of features, which can differ between various datasets. This paper explores feature weighting for clustering and presents new weighting strategies, including methods based on SHAP (SHapley Additive exPlanations), a technique commonly used for providing explainability in various supervised machine learning tasks. By taking advantage of SHAP values in a way other than just to gain explainability, we use them to weight features and ultimately improve the clustering process itself in unsupervised scenarios. Our empirical evaluations across five benchmark datasets and clustering methods demonstrate that feature weighting based on SHAP can enhance unsupervised clustering quality, achieving up to a 22.69\% improvement over other weighting methods (from 0.586 to 0.719 in terms of the Adjusted Rand Index). Additionally, these situations where the weighted data boosts the results are highlighted and thoroughly explored, offering insight for practical applications.
Attention Reveals More Than Tokens: Training-Free Long-Context Reasoning with Attention-guided Retrieval
Zhang, Yuwei, Srinivasa, Jayanth, Liu, Gaowen, Shang, Jingbo
Large Language Models (LLMs) often exhibit substantially shorter effective context lengths than their claimed capacities, especially when handling complex reasoning tasks that require integrating information from multiple parts of a long context and performing multi-step reasoning. Although Chain-of-Thought (CoT) prompting has shown promise in reducing task complexity, our empirical analysis reveals that it does not fully resolve this limitation. Through controlled experiments, we identify poor recall of implicit facts as the primary cause of failure, which significantly hampers reasoning performance. Interestingly, we observe that the internal attention weights from the generated CoT tokens can effectively ground implicit facts, even when these facts are not explicitly recalled. Building on this insight, we propose a novel training-free algorithm, Attrieval, which leverages attention weights to retrieve relevant facts from the long context and incorporates them into the reasoning process. Additionally, we find that selecting context tokens from CoT tokens further improves performance. Our results demonstrate that Attrieval enhances long-context reasoning capability notably on both synthetic and real-world QA datasets with various models.
Real-Time Risky Fault-Chain Search using Time-Varying Graph RNNs
This paper introduces a data-driven graphical framework for the real-time search of risky cascading fault chains (FCs) in power-grids, crucial for enhancing grid resiliency in the face of climate change. As extreme weather events driven by climate change increase, identifying risky FCs becomes crucial for mitigating cascading failures and ensuring grid stability. However, the complexity of the spatio-temporal dependencies among grid components and the exponential growth of the search space with system size pose significant challenges to modeling and risky FC search. To tackle this, we model the search process as a partially observable Markov decision process (POMDP), which is subsequently solved via a time-varying graph recurrent neural network (GRNN). This approach captures the spatial and temporal structure induced by the system's topology and dynamics, while efficiently summarizing the system's history in the GRNN's latent space, enabling scalable and effective identification of risky FCs.
BiasConnect: Investigating Bias Interactions in Text-to-Image Models
Shukla, Pushkar, Chinchure, Aditya, Diana, Emily, Tolbert, Alexander, Hosanagar, Kartik, Balasubramanian, Vineeth N., Sigal, Leonid, Turk, Matthew A.
The biases exhibited by Text-to-Image (TTI) models are often treated as if they are independent, but in reality, they may be deeply interrelated. Addressing bias along one dimension, such as ethnicity or age, can inadvertently influence another dimension, like gender, either mitigating or exacerbating existing disparities. Understanding these interdependencies is crucial for designing fairer generative models, yet measuring such effects quantitatively remains a challenge. In this paper, we aim to address these questions by introducing BiasConnect, a novel tool designed to analyze and quantify bias interactions in TTI models. Our approach leverages a counterfactual-based framework to generate pairwise causal graphs that reveals the underlying structure of bias interactions for the given text prompt. Additionally, our method provides empirical estimates that indicate how other bias dimensions shift toward or away from an ideal distribution when a given bias is modified. Our estimates have a strong correlation (+0.69) with the interdependency observations post bias mitigation. We demonstrate the utility of BiasConnect for selecting optimal bias mitigation axes, comparing different TTI models on the dependencies they learn, and understanding the amplification of intersectional societal biases in TTI models.