AITopics | multicollinearity

Collaborating Authors

multicollinearity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The limits of interpretability in multiple linear regression

Sharma, Anand, Liu, Chen, Coslovich, Daniele, Ozawa, Misaki

arXiv.org Machine LearningJun-16-2026

Interpreting machine-learning models has attracted increasing attention, particularly in the physical sciences, where one often seeks to understand the underlying mechanisms rather than merely make predictions. Multiple linear regression is often regarded as an interpretable alternative to more complex models, such as deep neural networks, because its predictions are expressed as explicit weighted sums of input features. However, when input features are strongly correlated, namely in the presence of multicollinearity, the learned weights can exhibit large dataset-to-dataset fluctuations and oscillatory behavior across physically similar features, making their interpretation difficult or even impossible. Although the instability of the weights under multicollinearity is well known in statistics, its consequences for physical interpretation, in particular its connection to oscillatory weights across physically similar features, have not been systematically clarified. Here, we theoretically discuss the mechanism behind this loss of interpretability by analyzing the eigenmodes of the feature correlation matrix. We show that small-eigenvalue modes associated with multicollinearity amplify fluctuations in the weights and generate oscillatory patterns that do not necessarily reflect meaningful contributions. We test this theoretical picture numerically on physics datasets and show that Ridge regularization suppresses these unstable modes, although the resulting weights must still be interpreted with caution. We further confirm the generality of our findings beyond physics by analyzing a diverse collection of publicly available datasets. Our results clarify why, in the presence of multicollinearity, physical interpretation can remain difficult even for linear regression models.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

2606.16013

Country: Europe (0.46)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback

3c6696d70d364337cf98dcb7c652a770-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 16:31:20 GMT

concurvity regularization, regularization, regularizer, (16 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Oceania > New Zealand > North Island > Waikato (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Curve Y our Enthusiasm: Concurvity Regularization in Differentiable Generalized Additive Models

Neural Information Processing SystemsOct-8-2025, 12:30:19 GMT

Despite the current enthusiasm for GAMs, their susceptibility to concurvity - i.e., (possibly nonlinear) dependencies between the features - has hitherto been largely overlooked.

concurvity regularization, regularization, regularizer, (16 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Oceania > New Zealand > North Island > Waikato (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Crop recommendation with machine learning: leveraging environmental and economic factors for optimal crop selection

Sam, Steven, DAbreo, Silima Marshal

arXiv.org Artificial IntelligenceMay-28-2025

Department of Computer Science College of Engineering, Design and Physical Science Brunel University London steven.sam@brunel.ac.uk Abstract Agriculture constitut es a primary source of food production, economic growth and employment in India, but the sector is confronted with low farm productivity and yields aggravated by increased pressure on natural resources and adverse climate change variability. Efforts involv ing green revolution, land irrigations, improved seeds and organic farming have yielded suboptimal outcomes. The adoption of innovative computational solutions such as crop recommendation systems is considered as a new frontier to provide insights and help farmers adapt and address the challenge of low productivity. However, existing agricultural recommendation systems have predominantly focused on environmental factors and narrow geographical coverage in India, resulting in limited and robust predictions o f suitable crops with both maximum yields and profits. This work incorporates both environmental and economic factors and 19 crop varieties across 15 states as input parameters to develop and evaluate two recommendation module s - Random Forest (RF) and Support Vector Machines (SVM) - using 10 - fold Cross Validation, Time - series Split and Lag Variables approaches. Results show that the 10 - fold cross validation approach produced exceptionally high accuracy (RF: 99.96%, SVM: 94.71%), raising concerns of overfitting. However, the introduction of temporal order, which aligns more with real - world scenarios, reduces the model performance (RF: 78.55%, SVM: 71.18%) in the Time - series Split approach. To further increase the model accuracy while maintaining the temporal order, the Lag Variables approach was employed, which resulted in improved performance (RF: 83.62%, SVM: 74.38%) compared to the 10 - fold cross validation approach. Consequently, the study shows the Random Forest model developed based on the Lag Variables as the most preferred algorithm for op timal crop recommendation in the Indian context. Key words: Crop recommendation model; Random forest; Support vector machines; Indian agriculture; Exploratory data analysis 1. Introduction Agriculture is not only fundamental for food production but also constitutes a primary source for economic growth, employment and improvement of the wellbeing of many people globally. For example, the World Bank reports that agriculture constitutes about 4 % of the world's total gross domestic product (GDP), and in certain least developed nations, its contribution to GDP exceeds 25%.

accuracy, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.21201

Country: Asia > India (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Food & Agriculture > Agriculture (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (1.00)

Add feedback

Machine Learning Techniques for Multifactor Analysis of National Carbon Dioxide Emissions

Xie, Wenjia, Li, Jinhui, Zong, Kai, Seco, Luis

arXiv.org Artificial IntelligenceMar-19-2025

This paper presents a comprehensive study leveraging Support Vector Machine (SVM) regression and Principal Component Regression (PCR) to analyze carbon dioxide emissions in a global dataset of 62 countries and their dependence on idiosyncratic, country-specific parameters. The objective is to understand the factors contributing to carbon dioxide emissions and identify the most predictive elements. The analysis provides country-specific emission estimates, highlighting diverse national trajectories and pinpointing areas for targeted interventions in climate change mitigation, sustainable development, and the growing carbon credit markets and green finance sector. The study aims to support policymaking with accurate representations of carbon dioxide emissions, offering nuanced information for formulating effective strategies to address climate change while informing initiatives related to carbon trading and environmentally sustainable investments.

artificial intelligence, carbon dioxide emission, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2503.15574

Country: North America > Canada (0.29)

Genre: Research Report (0.82)

Industry:

Energy > Oil & Gas (1.00)
Banking & Finance > Trading (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.93)

Add feedback

Fr\'echet regression for multi-label feature selection with implicit regularization

Mansouri, Dou El Kefel, Benkabou, Seif-Eddine, Benabdeslem, Khalid

arXiv.org Machine LearningDec-24-2024

Fréchet regression, an extension of classical linear regression to general metric spaces, offers a robust framework for modeling complex relationships between variables when the responses lie outside of Euclidean spaces. This approach is especially well suited to high-dimensional datasets, such as vector representations, with particular relevance to fields like imaging, where capturing nonlinear dependencies and the intrinsic data structure is critical for accurate modeling (Fréchet (1948), Petersen and Müller (2019), Bhattacharjee and Müller (2023), Qiu, Yu and Zhu (2024)). A significant consideration in Fréchet regression arises when predicting multiple responses simultaneously, as seen in multi-target or multidimensional problems (Zhang and Zhou (2007), Hyvönen, Jääsaari and Roos (2024)). Unlike traditional regression, where each observation corresponds to a single response, Fréchet regression can be extended to model complex interactions between multiple outputs. This ability to address complex relationships between several responses opens new avenues, particularly in fields such as bioinformatics (Huang et al. (2005)) and image analysis (Lathuilière et al. (2019)), where multidimensional data and interdependencies between responses require adaptive and specialized methodologies. However, to date, the handling of multilabel scenarios within the context of Fréchet regression remains relatively unexplored in the literature, despite its potential significance in addressing complex, multidimensional applications. In this paper, we present an extension of the Global Fréchet regression model, a specific variant of Fréchet regression that generalizes classical multiple linear regression by modeling responses as random objects. This extension enables the explicit modeling of relationships between input variables and multiple responses, thereby addressing the multi-label setting. Our second contribution in this paper addresses the dimensionality challenge in the context of the proposed Fréchet regression extension.

artificial intelligence, machine learning, regression, (17 more...)

arXiv.org Machine Learning

2412.18247

Country:

Europe > France (0.04)
Africa > Middle East > Algeria > Tiaret Province > Tiaret (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.75)

Add feedback

Table2Image: Interpretable Tabular data Classification with Realistic Image Transformations

Lee, Seungeun, Oh, Seungsang

arXiv.org Artificial IntelligenceDec-9-2024

Recent advancements in deep learning for tabular data have demonstrated promising performance, yet interpretable models remain limited, with many relying on complex and large-scale architectures. This paper introduces Table2Image, an interpretable framework that transforms tabular data into realistic image representations for classification, achieving competitive performance with relatively lightweight models. Additionally, we propose variance inflation factor (VIF) initialization, which reflects the statistical properties of the data, and a novel interpretability framework that integrates insights from both the original tabular data and its image transformations. By leveraging Shapley additive explanations (SHAP) with methods to minimize distributional discrepancies, our approach combines tabular and image-based representations. Experiments on benchmark datasets showcase competitive classification accuracy, area under the curve (AUC), and improved interpretability, offering a scalable and reliable solution. Our code is available at https://github.com/duneag2/table2image.

artificial intelligence, machine learning, tabular data, (13 more...)

arXiv.org Artificial Intelligence

2412.06265

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Cognitive phantoms in LLMs through the lens of latent variables

Peereboom, Sanne, Schwabe, Inga, Kleinberg, Bennett

arXiv.org Artificial IntelligenceSep-6-2024

Large language models (LLMs) increasingly reach real-world applications, necessitating a better understanding of their behaviour. Their size and complexity complicate traditional assessment methods, causing the emergence of alternative approaches inspired by the field of psychology. Recent studies administering psychometric questionnaires to LLMs report human-like traits in LLMs, potentially influencing LLM behaviour. However, this approach suffers from a validity problem: it presupposes that these traits exist in LLMs and that they are measurable with tools designed for humans. Typical procedures rarely acknowledge the validity problem in LLMs, comparing and interpreting average LLM scores. This study investigates this problem by comparing latent structures of personality between humans and three LLMs using two validated personality questionnaires. Findings suggest that questionnaires designed for humans do not validly measure similar constructs in LLMs, and that these constructs may not exist in LLMs at all, highlighting the need for psychometric analyses of LLM responses to avoid chasing cognitive phantoms. Keywords: large language models, psychometrics, machine behaviour, latent variable modeling, validity

dimension, llm, questionnaire, (17 more...)

arXiv.org Artificial Intelligence

2409.15324

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Netherlands (0.04)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.88)
Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Impact of Comprehensive Data Preprocessing on Predictive Modelling of COVID-19 Mortality

Das, Sangita, Maji, Subhrajyoti

arXiv.org Artificial IntelligenceAug-15-2024

Accurate predictive models are crucial for analysing COVID-19 mortality trends. This study evaluates the impact of a custom data preprocessing pipeline on ten machine learning models predicting COVID-19 mortality using data from Our World in Data (OWID). Our pipeline differs from a standard preprocessing pipeline through four key steps. Firstly, it transforms weekly reported totals into daily updates, correcting reporting biases and providing more accurate estimates. Secondly, it uses localised outlier detection and processing to preserve data variance and enhance accuracy. Thirdly, it utilises computational dependencies among columns to ensure data consistency. Finally, it incorporates an iterative feature selection process to optimise the feature set and improve model performance. Results show a significant improvement with the custom pipeline: the MLP Regressor achieved a test RMSE of 66.556 and a test R-squared of 0.991, surpassing the DecisionTree Regressor from the standard pipeline, which had a test RMSE of 222.858 and a test R-squared of 0.817. These findings highlight the importance of tailored preprocessing techniques in enhancing predictive modelling accuracy for COVID-19 mortality. Although specific to this study, these methodologies offer valuable insights into diverse datasets and domains, improving predictive performance across various contexts.

custom pipeline, model performance, pipeline, (15 more...)

arXiv.org Artificial Intelligence

2408.08142

Country:

North America > Canada > Alberta (0.14)
Europe (0.14)
Asia > India (0.05)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Explainable Artificial Intelligence and Multicollinearity : A Mini Review of Current Approaches

Salih, Ahmed M

arXiv.org Machine LearningJun-17-2024

Explainable Artificial Intelligence (XAI) methods help to understand the internal mechanism of machine learning models and how they reach a specific decision or made a specific action. The list of informative features is one of the most common output of XAI methods. Multicollinearity is one of the big issue that should be considered when XAI generates the explanation in terms of the most informative features in an AI system. No review has been dedicated to investigate the current approaches to handle such significant issue. In this paper, we provide a review of the current state-of-the-art approaches in relation to the XAI in the context of recent advances in dealing with the multicollinearity issue. To do so, we searched in three repositories that are: Web of Science, Scopus and IEEE Xplore to find pertinent published papers. After excluding irrelevant papers, seven papers were considered in the review. In addition, we discuss the current XAI methods and their limitations in dealing with the multicollinearity and suggest future directions.

explanation, multicollinearity, xai method, (9 more...)

arXiv.org Machine Learning

2406.11524

Country:

Europe > United Kingdom > England > Leicestershire > Leicester (0.05)
Asia > Middle East > Iraq > Kurdistan Region > Duhok Governorate > Zakho (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(2 more...)

Genre: Overview (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback