feature generation
- Asia > Singapore (0.05)
- North America > United States (0.04)
- Asia > China > Hong Kong (0.04)
- (2 more...)
- Banking & Finance > Trading (1.00)
- Information Technology (0.93)
- Law (0.93)
DeepFeature: Iterative Context-aware Feature Generation for Wearable Biosignals
Liu, Kaiwei, He, Yuting, Yang, Bufang, Yuan, Mu, Wong, Chun Man Victor, Sze, Ho Pong Andrew, Yan, Zhenyu, Chen, Hongkai
Biosignals collected from wearable devices are widely utilized in healthcare applications. Machine learning models used in these applications often rely on features extracted from biosignals due to their effectiveness, lower data dimensionality, and wide compatibility across various model architectures. However, existing feature extraction methods often lack task-specific contextual knowledge, struggle to identify optimal feature extraction settings in high-dimensional feature space, and are prone to code generation and automation errors. In this paper, we propose DeepFeature, the first LLM-empowered, context-aware feature generation framework for wearable biosignals. DeepFeature introduces a multi-source feature generation mechanism that integrates expert knowledge with task settings. It also employs an iterative feature refinement process that uses feature assessment-based feedback for feature re-selection. Additionally, DeepFeature utilizes a robust multi-layer filtering and verification approach for robust feature-to-code translation to ensure that the extraction functions run without crashing. Experimental evaluation results show that DeepFeature achieves an average AUROC improvement of 4.21-9.67% across eight diverse tasks compared to baseline methods. It outperforms state-of-the-art approaches on five tasks while maintaining comparable performance on the remaining tasks.
- Asia > China > Hong Kong (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > New York (0.04)
- (2 more...)
- Asia > Singapore (0.05)
- North America > United States (0.04)
- Asia > China > Hong Kong (0.04)
- (2 more...)
- Banking & Finance > Trading (1.00)
- Information Technology (0.93)
- Law (0.93)
Bellman Error Based Feature Generation using Random Projections on Sparse Spaces
This paper addresses the problem of automatic generation of features for value function approximation in reinforcement learning. Bellman Error Basis Functions (BEBFs) have been shown to improve the error of policy evaluation with function approximation, with a convergence rate similar to that of value iteration. We propose a simple, fast and robust algorithm based on random projections, which generates BEBFs for sparse feature spaces. We provide a finite sample analysis of the proposed method, and prove that projections logarithmic in the dimension of the original space guarantee a contraction in the error. Empirical results demonstrate the strength of this method in domains in which choosing a good state representation is challenging.
Reinforcement Learning-based Feature Generation Algorithm for Scientific Data
Xiao, Meng, Zhou, Junfeng, Zhou, Yuanchun
Feature generation (FG) aims to enhance the prediction potential of original data by constructing high-order feature combinations and removing redundant features. It is a key preprocessing step for tabular scientific data to improve downstream machine-learning model performance. Traditional methods face the following two challenges when dealing with the feature generation of scientific data: First, the effective construction of high-order feature combinations in scientific data necessitates profound and extensive domain-specific expertise. Secondly, as the order of feature combinations increases, the search space expands exponentially, imposing prohibitive human labor consumption. Advancements in the Data-Centric Artificial Intelligence (DCAI) paradigm have opened novel avenues for automating feature generation processes. Inspired by that, this paper revisits the conventional feature generation workflow and proposes the Multi-agent Feature Generation (MAFG) framework. Specifically, in the iterative exploration stage, multi-agents will construct mathematical transformation equations collaboratively, synthesize and identify feature combinations ex-hibiting high information content, and leverage a reinforcement learning mechanism to evolve their strategies. Upon completing the exploration phase, MAFG integrates the large language models (LLMs) to interpreta-tively evaluate the generated features of each significant model performance breakthrough. Experimental results and case studies consistently demonstrate that the MAFG framework effectively automates the feature generation process and significantly enhances various downstream scientific data mining tasks.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Beijing > Beijing (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (6 more...)
- Education (1.00)
- Health & Medicine > Therapeutic Area > Nephrology (0.46)
Few-Shot Inspired Generative Zero-Shot Learning
Shohag, Md Shakil Ahamed, Wu, Q. M. Jonathan, Pourpanah, Farhad
Generative zero-shot learning (ZSL) methods typically synthesize visual features for unseen classes using predefined semantic attributes, followed by training a fully supervised classification model. While effective, these methods require substantial computational resources and extensive synthetic data, thereby relaxing the original ZSL assumptions. In this paper, we propose FSIGenZ, a few-shot-inspired generative ZSL framework that reduces reliance on large-scale feature synthesis. Our key insight is that class-level attributes exhibit instance-level variability, i.e., some attributes may be absent or partially visible, yet conventional ZSL methods treat them as uniformly present. To address this, we introduce Model-Specific Attribute Scoring (MSAS), which dynamically re-scores class attributes based on model-specific optimization to approximate instance-level variability without access to unseen data. We further estimate group-level prototypes as clusters of instances based on MSAS-adjusted attribute scores, which serve as representative synthetic features for each unseen class. To mitigate the resulting data imbalance, we introduce a Dual-Purpose Semantic Regularization (DPSR) strategy while training a semantic-aware contrastive classifier (SCC) using these prototypes. Experiments on SUN, AwA2, and CUB benchmarks demonstrate that FSIGenZ achieves competitive performance using far fewer synthetic features.
Two-Stage Feature Generation with Transformer and Reinforcement Learning
Gao, Wanfu, Man, Zengyao, He, Zebin, Tang, Yuhao, Gao, Jun, Liu, Kunpeng
Feature generation is a critical step in machine learning, aiming to enhance model performance by capturing complex relationships within the data and generating meaningful new features. Traditional feature generation methods heavily rely on domain expertise and manual intervention, making the process labor-intensive and challenging to adapt to different scenarios. Although automated feature generation techniques address these issues to some extent, they often face challenges such as feature redundancy, inefficiency in feature space exploration, and limited adaptability to diverse datasets and tasks. To address these problems, we propose a Two-Stage Feature Generation (TSFG) framework, which integrates a Transformer-based encoder-decoder architecture with Proximal Policy Optimization (PPO). The encoder-decoder model in TSFG leverages the Transformer's self-attention mechanism to efficiently represent and transform features, capturing complex dependencies within the data. PPO further enhances TSFG by dynamically adjusting the feature generation strategy based on task-specific feedback, optimizing the process for improved performance and adaptability. TSFG dynamically generates high-quality feature sets, significantly improving the predictive performance of machine learning models. Experimental results demonstrate that TSFG outperforms existing state-of-the-art methods in terms of feature quality and adaptability.
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (4 more...)
Agentic Feature Augmentation: Unifying Selection and Generation with Teaming, Planning, and Memories
Gong, Nanxu, Dong, Sixun, Bai, Haoyue, Wang, Xinyuan, Ying, Wangyang, Fu, Yanjie
As a widely-used and practical tool, feature engineering transforms raw data into discriminative features to advance AI model performance. However, existing methods usually apply feature selection and generation separately, failing to strive a balance between reducing redundancy and adding meaningful dimensions. To fill this gap, we propose an agentic feature augmentation concept, where the unification of feature generation and selection is modeled as agentic teaming and planning. Specifically, we develop a Multi-Agent System with Long and Short-Term Memory (MAGS), comprising a selector agent to eliminate redundant features, a generator agent to produce informative new dimensions, and a router agent that strategically coordinates their actions. We leverage in-context learning with short-term memory for immediate feedback refinement and long-term memory for globally optimal guidance. Additionally, we employ offline Proximal Policy Optimization (PPO) reinforcement fine-tuning to train the router agent for effective decision-making to navigate a vast discrete feature space. Extensive experiments demonstrate that this unified agentic framework consistently achieves superior task performance by intelligently orchestrating feature selection and generation.
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- North America > United States > Arizona (0.04)
- Europe > Germany > Bavaria > Lower Franconia > Würzburg (0.04)
Dual-Agent Reinforcement Learning for Automated Feature Generation
Gao, Wanfu, Man, Zengyao, Pan, Hanlin, Liu, Kunpeng
Feature generation involves creating new features from raw data to capture complex relationships among the original features, improving model robustness and machine learning performance. Current methods using reinforcement learning for feature generation have made feature exploration more flexible and efficient. However, several challenges remain: first, during feature expansion, a large number of redundant features are generated. When removing them, current methods only retain the best features each round, neglecting those that perform poorly initially but could improve later. Second, the state representation used by current methods fails to fully capture complex feature relationships. Third, there are significant differences between discrete and continuous features in tabular data, requiring different operations for each type. To address these challenges, we propose a novel dual-agent reinforcement learning method for feature generation. Two agents are designed: the first generates new features, and the second determines whether they should be preserved. A self-attention mechanism enhances state representation, and diverse operations distinguish interactions between discrete and continuous features. The experimental results on multiple datasets demonstrate that the proposed method is effective. The code is available at https://github.com/extess0/DARL.
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)