feature label
Automated Feature Labeling with Token-Space Gradient Descent
Schulz, Julian, Fallows, Seamus
We present a novel approach to feature labeling using gradient descent in token-space. While existing methods typically use language models to generate hypotheses about feature meanings, our method directly optimizes label representations by using a language model as a discriminator to predict feature activations. We formulate this as a multi-objective optimization problem in token-space, balancing prediction accuracy, entropy minimization, and linguistic naturalness. Our proof-of-concept experiments demonstrate successful convergence to interpretable single-token labels across diverse domains, including features for detecting animals, mammals, Chinese text, and numbers. Although our current implementation is constrained to single-token labels and relatively simple features, the results suggest that token-space gradient descent could become a valuable addition to the interpretability researcher's toolkit. Recent work in mechanistic interpretability has made significant progress in decomposing neural networks into interpretable features.
DRUPI: Dataset Reduction Using Privileged Information
Wang, Shaobo, Yang, Yantai, Zhang, Shuaiyu, Sun, Chenghao, Li, Weiya, Hu, Xuming, Zhang, Linfeng
Dataset reduction (DR) seeks to select or distill samples from large datasets into smaller subsets while preserving performance on target tasks. Existing methods primarily focus on pruning or synthesizing data in the same format as the original dataset, typically the input data and corresponding labels. However, in DR settings, we find it is possible to synthesize more information beyond the data-label pair as an additional learning target to facilitate model training. In this paper, we introduce Dataset Reduction Using Privileged Information (DRUPI), which enriches DR by synthesizing privileged information alongside the reduced dataset. This privileged information can take the form of feature labels or attention labels, providing auxiliary supervision to improve model learning. Our findings reveal that effective feature labels must balance between being overly discriminative and excessively diverse, with a moderate level proving optimal for improving the reduced dataset's efficacy. Extensive experiments on ImageNet, CIFAR-10/100, and Tiny ImageNet demonstrate that DRUPI integrates seamlessly with existing dataset reduction methods, offering significant performance gains. The code will be released after the paper is accepted. Dataset Reduction (DR) has attracted considerable attention in recent years, with the primary aim of compressing large datasets into smaller subsets while maintaining comparable statistical performance. Existing methods for DR can be broadly classified into two main categories: coreset selection and dataset distillation. In typical real-world scenarios, training models for target tasks is generally constrained to input data (e.g., images) and their corresponding labels, as these are the most readily available elements.
Knockoff-Guided Feature Selection via A Single Pre-trained Reinforced Agent
Wang, Xinyuan, Wang, Dongjie, Ying, Wangyang, Xie, Rui, Chen, Haifeng, Fu, Yanjie
Feature selection prepares the AI-readiness of data by eliminating redundant features. Prior research falls into two primary categories: i) Supervised Feature Selection, which identifies the optimal feature subset based on their relevance to the target variable; ii) Unsupervised Feature Selection, which reduces the feature space dimensionality by capturing the essential information within the feature set instead of using target variable. However, SFS approaches suffer from time-consuming processes and limited generalizability due to the dependence on the target variable and downstream ML tasks. UFS methods are constrained by the deducted feature space is latent and untraceable. To address these challenges, we introduce an innovative framework for feature selection, which is guided by knockoff features and optimized through reinforcement learning, to identify the optimal and effective feature subset. In detail, our method involves generating "knockoff" features that replicate the distribution and characteristics of the original features but are independent of the target variable. Each feature is then assigned a pseudo label based on its correlation with all the knockoff features, serving as a novel metric for feature evaluation. Our approach utilizes these pseudo labels to guide the feature selection process in 3 novel ways, optimized by a single reinforced agent: 1). A deep Q-network, pre-trained with the original features and their corresponding pseudo labels, is employed to improve the efficacy of the exploration process in feature selection. 2). We introduce unsupervised rewards to evaluate the feature subset quality based on the pseudo labels and the feature space reconstruction loss to reduce dependencies on the target variable. 3). A new {\epsilon}-greedy strategy is used, incorporating insights from the pseudo labels to make the feature selection process more effective.
Towards Detecting Harmful Agendas in News Articles
Subbiah, Melanie, Bhattacharjee, Amrita, Hua, Yilun, Kumarage, Tharindu, Liu, Huan, McKeown, Kathleen
Manipulated news online is a growing problem which necessitates the use of automated systems to curtail its spread. We argue that while misinformation and disinformation detection have been studied, there has been a lack of investment in the important open challenge of detecting harmful agendas in news articles; identifying harmful agendas is critical to flag news campaigns with the greatest potential for real world harm. Moreover, due to real concerns around censorship, harmful agenda detectors must be interpretable to be effective. In this work, we propose this new task and release a dataset, NewsAgendas, of annotated news articles for agenda identification. We show how interpretable systems can be effective on this task and demonstrate that they can perform comparably to black-box models.
Interventional Probing in High Dimensions: An NLI Case Study
Rozanova, Julia, Valentino, Marco, Cordeiro, Lucas, Freitas, Andre
Probing strategies have been shown to detect the presence of various linguistic features in large language models; in particular, semantic features intermediate to the "natural logic" fragment of the Natural Language Inference task (NLI). In the case of natural logic, the relation between the intermediate features and the entailment label is explicitly known: as such, this provides a ripe setting for interventional studies on the NLI models' representations, allowing for stronger causal conjectures and a deeper critical analysis of interventional probing methods. In this work, we carry out new and existing representation-level interventions to investigate the effect of these semantic features on NLI classification: we perform amnesic probing (which removes features as directed by learned linear probes) and introduce the mnestic probing variation (which forgets all dimensions except the probe-selected ones). Furthermore, we delve into the limitations of these methods and outline some pitfalls have been obscuring the effectivity of interventional probing studies.
Generating custom photo-realistic faces using AI – Insight Data
All the code and online demo are available at the project page. Describing an image is easy for humans, and we are able to do it from a very young age. In machine learning, this task is a discriminative classification/regression problem, i.e. predicting feature labels from input images. Recent advancements in ML/AI techniques, especially deep learning models, are beginning to excel in these tasks, sometimes reaching or exceeding human performance, as is demonstrated in scenarios like visual object recognition (e.g. from AlexNet to ResNet on ImageNet classification) and object detection/segmentation (e.g. from RCNN to YOLO on COCO dataset), etc. However, the other way around, generating realistic images based on descriptions, is much harder, and takes years of graphic design training.
AdaFlock: Adaptive Feature Discovery for Human-in-the-loop Predictive Modeling
Takahama, Ryusuke (scouty Inc.) | Baba, Yukino (Kyoto University) | Shimizu, Nobuyuki (Yahoo Japan Corporation) | Fujita, Sumio (Yahoo Japan Corporation) | Kashima, Hisashi (Kyoto University)
Feature engineering is the key to successful application of machine learning algorithms to real-world data. The discovery of informative features often requires domain knowledge or human inspiration, and data scientists expend a certain amount of effort into exploring feature spaces. Crowdsourcing is considered a promising approach for allowing many people to be involved in feature engineering; however, there is a demand for a sophisticated strategy that enables us to acquire good features at a reasonable crowdsourcing cost. In this paper, we present a novel algorithm called AdaFlock to efficiently obtain informative features through crowdsourcing. AdaFlock is inspired by AdaBoost, which iteratively trains classifiers by increasing the weights of samples misclassified by previous classifiers. AdaFlock iteratively generates informative features; at each iteration of AdaFlock, crowdsourcing workers are shown samples selected according to the classification errors of the current classifiers and are asked to generate new features that are helpful for correctly classifying the given examples. The results of our experiments conducted using real datasets indicate that AdaFlock successfully discovers informative features with fewer iterations and achieves high classification accuracy.