AITopics

Country: Europe (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Laetitia Papaxanthos, Felipe Llinares-López, Dean Bodenham, Karsten Borgwardt

Finding significant combinations of features in the presence of categorical covariates

Neural Information Processing SystemsMar-23-2026, 00:51:33 GMT

In high-dimensional settings, where the number of features pis much larger than the number of samples n, methods that systematically examine arbitrary combinations of features have only recently begun to be explored. However, none of the current methods is able to assess the association between feature combinations and a target variable while conditioning on a categorical covariate. As a result, many false discoveries might occur due to unaccounted confounding effects. We propose the Fast Automatic Conditional Search (FACS) algorithm, a significant discriminative itemset mining method which conditions on categorical covariates and only scales as O(klog k), where k is the number of states of the categorical covariate. Based on the Cochran-Mantel-Haenszel Test, FACS demonstrates superior speed and statistical power on simulated and real-world datasets compared to the state of the art, opening the door to numerous applications in biomedicine.

artificial intelligence, criterion, machine learning, (16 more...)

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.31)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-9-2026, 21:16:01 GMT

8f1fa0193ca2b5d2fa0695827d8270e9-Paper.pdf

decision-making process, prediction, voten, (15 more...)

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsDec-24-2025, 06:31:40 GMT

Sparse Interaction Additive Networks via Feature Interaction Detection and Sparse Selection

There is currently a large gap in performance between the statistically rigorous methods like linear regression or additive splines and the powerful deep methods using neural networks. Previous works attempting to close this gap have failed to fully consider the exponentially growing number of feature combinations which deep networks consider automatically during training. In this work, we develop a tractable selection algorithm to efficiently identify the necessary feature combinations by leveraging techniques in feature interaction detection.Our proposed Sparse Interaction Additive Networks (SIAN) construct a bridge from these simple and interpretable models to a fully connected neural network. SIAN achieves competitive performance against state-of-the-art methods across multiple large-scale tabular datasets and consistently finds an optimal tradeoff between the modeling capacity of neural networks and the generalizability of simpler methods.

feature interaction detection, interaction detection and sparse selection, sparse interaction additive network, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.76)

arXiv.org Artificial IntelligenceOct-13-2025

MATT-CTR: Unleashing a Model-Agnostic Test-Time Paradigm for CTR Prediction with Confidence-Guided Inference Paths

Zhang, Moyu, Chen, Yun, Jin, Yujun, Hu, Jinxin, Zhang, Yu, Zeng, Xiaoyi

Recently, a growing body of research has focused on either optimizing CTR model architectures to better model feature interactions or refining training objectives to aid parameter learning, thereby achieving better predictive performance. However, previous efforts have primarily focused on the training phase, largely neglecting opportunities for optimization during the inference phase. Infrequently occurring feature combinations, in particular, can degrade prediction performance, leading to unreliable or low-confidence outputs. To unlock the predictive potential of trained CTR models, we propose a Model-Agnostic Test-Time paradigm (MATT), which leverages the confidence scores of feature combinations to guide the generation of multiple inference paths, thereby mitigating the influence of low-confidence features on the final prediction. Specifically, to quantify the confidence of feature combinations, we introduce a hierarchical probabilistic hashing method to estimate the occurrence frequencies of feature combinations at various orders, which serve as their corresponding confidence scores. Then, using the confidence scores as sampling probabilities, we generate multiple instance-specific inference paths through iterative sampling and subsequently aggregate the prediction scores from multiple paths to conduct robust predictions. Finally, extensive offline experiments and online A/B tests strongly validate the compatibility and effectiveness of MATT across existing CTR models.

artificial intelligence, machine learning, natural language, (18 more...)

2510.08932

Country:

North America (0.46)
Asia > China (0.15)

Genre: Research Report > Experimental Study (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsAug-15-2025, 22:42:51 GMT

Discerning Decision-Making Process of Deep Neural Networks with Hierarchical Voting Transformation Ying Sun

While it is expected to understand their intrinsic decision-making processes, these deep neural networks often work in a black-box way.

decision-making process, prediction, voten, (15 more...)

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Transportation (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Xiao, Meng, Zhou, Junfeng, Zhou, Yuanchun

Reinforcement Learning-based Feature Generation Algorithm for Scientific Data

arXiv.org Artificial IntelligenceJul-10-2025

Feature generation (FG) aims to enhance the prediction potential of original data by constructing high-order feature combinations and removing redundant features. It is a key preprocessing step for tabular scientific data to improve downstream machine-learning model performance. Traditional methods face the following two challenges when dealing with the feature generation of scientific data: First, the effective construction of high-order feature combinations in scientific data necessitates profound and extensive domain-specific expertise. Secondly, as the order of feature combinations increases, the search space expands exponentially, imposing prohibitive human labor consumption. Advancements in the Data-Centric Artificial Intelligence (DCAI) paradigm have opened novel avenues for automating feature generation processes. Inspired by that, this paper revisits the conventional feature generation workflow and proposes the Multi-agent Feature Generation (MAFG) framework. Specifically, in the iterative exploration stage, multi-agents will construct mathematical transformation equations collaboratively, synthesize and identify feature combinations ex-hibiting high information content, and leverage a reinforcement learning mechanism to evolve their strategies. Upon completing the exploration phase, MAFG integrates the large language models (LLMs) to interpreta-tively evaluate the generated features of each significant model performance breakthrough. Experimental results and case studies consistently demonstrate that the MAFG framework effectively automates the feature generation process and significantly enhances various downstream scientific data mining tasks.

machine learning, reinforcement, reinforcement learning, (11 more...)

2507.03498

Country:

Asia (0.69)
North America > United States > Louisiana (0.14)

Genre: Workflow (0.68)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Nephrology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJul-8-2025

TayFCS: Towards Light Feature Combination Selection for Deep Recommender Systems

Wang, Xianquan, Du, Zhaocheng, Zhu, Jieming, Wu, Chuhan, Jia, Qinglin, Dong, Zhenhua

Feature interaction modeling is crucial for deep recommendation models. A common and effective approach is to construct explicit feature combinations to enhance model performance. However, in practice, only a small fraction of these combinations are truly informative. Thus it is essential to select useful feature combinations to reduce noise and manage memory consumption. While feature selection methods have been extensively studied, they are typically limited to selecting individual features. Extending these methods for high-order feature combination selection presents a significant challenge due to the exponential growth in time complexity when evaluating feature combinations one by one. In this paper, we propose $\textbf{TayFCS}$, a lightweight feature combination selection method that significantly improves model performance. Specifically, we propose the Taylor Expansion Scorer (TayScorer) module for field-wise Taylor expansion on the base model. Instead of evaluating all potential feature combinations' importance by repeatedly running experiments with feature adding and removal, this scorer only needs to approximate the importance based on their sub-components' gradients. This can be simply computed with one backward pass based on a trained recommendation model. To further reduce information redundancy among feature combinations and their sub-components, we introduce Logistic Regression Elimination (LRE), which estimates the corresponding information gain based on the model prediction performance. Experimental results on three benchmark datasets validate both the effectiveness and efficiency of our approach. Furthermore, online A/B test results demonstrate its practical applicability and commercial value.

artificial intelligence, feature combination, machine learning, (15 more...)

doi: 10.1145/3711896.3737266

2507.03895

Country:

Asia (1.00)
North America > United States (0.94)

Genre:

Research Report > New Finding (0.86)
Research Report > Experimental Study (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Nikolikj, Ana, Ochoa, Gabriela, Eftimov, Tome

Customized Exploration of Landscape Features Driving Multi-Objective Combinatorial Optimization Performance

arXiv.org Artificial IntelligenceJul-3-2025

We present an analysis of landscape features for predicting the performance of multi-objective combinatorial optimization algorithms. We consider features from the recently proposed compressed Pareto Local Optimal Solutions Networks (C-PLOS-net) model of combinatorial landscapes. The benchmark instances are a set of rmnk-landscapes with 2 and 3 objectives and various levels of ruggedness and objective correlation. We consider the performance of three algorithms -- Pareto Local Search (PLS), Global Simple EMO Optimizer (GSEMO), and Non-dominated Sorting Genetic Algorithm (NSGA-II) - using the resolution and hypervolume metrics. Our tailored analysis reveals feature combinations that influence algorithm performance specific to certain landscapes. This study provides deeper insights into feature importance, tailored to specific rmnk-landscapes and algorithms.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

2507.01638

Country:

Europe > Slovenia (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > Scotland > Stirling > Stirling (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.90)
(2 more...)

Samarasinghage, Chanuka Don, Gulabani, Dhruv

Towards Interpretable and Efficient Feature Selection in Trajectory Datasets: A Taxonomic Approach

arXiv.org Artificial IntelligenceJun-26-2025

Trajectory analysis is not only about obtaining movement data, but it is also of paramount importance in understanding the pattern in which an object moves through space and time, as well as in predicting its next move. Due to the significant interest in the area, data collection has improved substantially, resulting in a large number of features becoming available for training and predicting models. However, this introduces a high-dimensionality-induced feature explosion problem, which reduces the efficiency and interpretability of the data, thereby reducing the accuracy of machine learning models. To overcome this issue, feature selection has become one of the most prevalent tools. Thus, the objective of this paper was to introduce a taxonomy-based feature selection method that categorizes features based on their internal structure. This approach classifies the data into geometric and kinematic features, further categorizing them into curvature, indentation, speed, and acceleration. The comparative analysis indicated that a taxonomy-based approach consistently achieved comparable or superior predictive performance. Furthermore, due to the taxonomic grouping, which reduces combinatorial space, the time taken to select features was drastically reduced. The taxonomy was also used to gain insights into what feature sets each dataset was more sensitive to. Overall, this study provides robust evidence that a taxonomy-based feature selection method can add a layer of interpretability, reduce dimensionality and computational complexity, and contribute to high-level decision-making. It serves as a step toward providing a methodological framework for researchers and practitioners dealing with trajectory datasets and contributing to the broader field of explainable artificial intelligence.

artificial intelligence, dataset, machine learning, (18 more...)

2506.20359

Country: North America > Canada > Ontario > National Capital Region > Ottawa (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Transportation (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)