AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Review for NeurIPS paper: Counterfactual Prediction for Bundle Treatment

Neural Information Processing SystemsFeb-7-2025, 11:28:45 GMT

Additional Feedback: As mentioned above, I think this method is very nice, but should be framed differently. In particular, the issue being addressed is not confounding _bias_; it is sample inefficiency when estimating the regression model f_{\theta_p}. This distinction is important in the causal inference literature, because a bias does not disappear with sample size. However, in this context, under the unconfoundedness assumption, if the model f_{\theta_p} is sufficiently flexible, it will converge to the same true counterfactual model in the large sample limit regardless of how the data are weighted (this is consistent with the experiments in the paper). In other words, the population risks E_{cf} and E_f w are minimized at the same function.

bundle treatment, counterfactual prediction, neurips paper, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.38)

Add feedback

Coherent Local Explanations for Mathematical Optimization

Otto, Daan, Kurtz, Jannis, Birbil, S. Ilker

arXiv.org Artificial IntelligenceFeb-7-2025

The surge of explainable artificial intelligence methods seeks to enhance transparency and explainability in machine learning models. At the same time, there is a growing demand for explaining decisions taken through complex algorithms used in mathematical optimization. However, current explanation methods do not take into account the structure of the underlying optimization problem, leading to unreliable outcomes. In response to this need, we introduce Coherent Local Explanations for Mathematical Optimization (CLEMO). CLEMO provides explanations for multiple components of optimization models, the objective value and decision variables, which are coherent with the underlying model structure. Our sampling-based procedure can provide explanations for the behavior of exact and heuristic solution algorithms. The effectiveness of CLEMO is illustrated by experiments for the shortest path problem, the knapsack problem, and the vehicle routing problem.

artificial intelligence, explanation, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.0484

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Transportation (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

Efficient distributional regression trees learning algorithms for calibrated non-parametric probabilistic forecasts

Quentin, Duchemin, Guillaume, Obozinski

arXiv.org Artificial IntelligenceFeb-7-2025

The perspective of developing trustworthy AI for critical applications in science and engineering requires machine learning techniques that are capable of estimating their own uncertainty. In the context of regression, instead of estimating a conditional mean, this can be achieved by producing a predictive interval for the output, or to even learn a model of the conditional probability $p(y|x)$ of an output $y$ given input features $x$. While this can be done under parametric assumptions with, e.g. generalized linear model, these are typically too strong, and non-parametric models offer flexible alternatives. In particular, for scalar outputs, learning directly a model of the conditional cumulative distribution function of $y$ given $x$ can lead to more precise probabilistic estimates, and the use of proper scoring rules such as the weighted interval score (WIS) and the continuous ranked probability score (CRPS) lead to better coverage and calibration properties. This paper introduces novel algorithms for learning probabilistic regression trees for the WIS or CRPS loss functions. These algorithms are made computationally efficient thanks to an appropriate use of known data structures - namely min-max heaps, weight-balanced binary trees and Fenwick trees. Through numerical experiments, we demonstrate that the performance of our methods is competitive with alternative approaches. Additionally, our methods benefit from the inherent interpretability and explainability of trees. As a by-product, we show how our trees can be used in the context of conformal prediction and explain why they are particularly well-suited for achieving group-conditional coverage guarantees.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.05157

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.67)
Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Probing Internal Representations of Multi-Word Verbs in Large Language Models

Kissane, Hassane, Schilling, Achim, Krauss, Patrick

arXiv.org Artificial IntelligenceFeb-7-2025

This study investigates the internal representations of verb-particle combinations, called multi-word verbs, within transformer-based large language models (LLMs), specifically examining how these models capture lexical and syntactic properties at different neural network layers. Using the BERT architecture, we analyze the representations of its layers for two different verb-particle constructions: phrasal verbs like 'give up' and prepositional verbs like 'look at'. Our methodology includes training probing classifiers on the internal representations to classify these categories at both word and sentence levels. The results indicate that the model's middle layers achieve the highest classification accuracies. To further analyze the nature of these distinctions, we conduct a data separability test using the Generalized Discrimination Value (GDV). While GDV results show weak linear separability between the two verb types, probing classifiers still achieve high accuracy, suggesting that representations of these linguistic categories may be non-linearly separable. This aligns with previous research indicating that linguistic distinctions in neural networks are not always encoded in a linearly separable manner. These findings computationally support usage-based claims on the representation of verb-particle constructions and highlight the complex interaction between neural network architectures and linguistic structures.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.04789

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
(7 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.31)

Add feedback

Review for NeurIPS paper: A convex optimization formulation for multivariate regression

Neural Information Processing SystemsFeb-6-2025, 07:02:28 GMT

Weaknesses: The major weaknesses of the paper are listed below: 1. There are some potential inaccuracies in the description of the algorithm. For example, in Section 3.1, the first equalities in the two lines of equations after line 210 should be \approx instead, right? And does the notation p_{\tau_B} ' denote the sub-gradient of p_{\tau_B}? In general, some more explanations about the linearization here would be helpful.

convex optimization formulation, initialization, multivariate regression, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.40)

Add feedback

Review for NeurIPS paper: A convex optimization formulation for multivariate regression

Neural Information Processing SystemsFeb-6-2025, 07:02:20 GMT

This paper proposes a new parametrization of the multivariate linear regression problem. It shows that under this new parametrization, it is easier to employ sparsity inducing penalty terms on the inverse covariance matrix. The paper suggests a sequential relaxation algorithm. The reviewers noted the novelty of the approach and numerous strengths. The simulation experiments (in the supplementary material) explore the method in the context of several connectivity scenarios. However, one weakness is the exploration of the performance of the model on real data scenarios.

convex optimization formulation, multivariate regression, neurips paper, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.76)

Add feedback

A Classification System Approach in Predicting Chinese Censorship

Prodani, Matt, Ze, Tianchu, Hu, Yushen

arXiv.org Artificial IntelligenceFeb-6-2025

This paper is dedicated to using a classifier to predict whether a Weibo post would be censored under the Chinese internet. Through randomized sampling from \citeauthor{Fu2021} and Chinese tokenizing strategies, we constructed a cleaned Chinese phrase dataset with binary censorship markings. Utilizing various probability-based information retrieval methods on the data, we were able to derive 4 logistic regression models for classification. Furthermore, we experimented with pre-trained transformers to perform similar classification tasks. After evaluating both the macro-F1 and ROC-AUC metrics, we concluded that the Fined-Tuned BERT model exceeds other strategies in performance.

information retrieval, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.04234

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre:

Research Report > New Finding (0.50)
Research Report > Experimental Study (0.36)

Industry: Law > Civil Rights & Constitutional Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)

Add feedback

Quantifying Correlations of Machine Learning Models

Li, Yuanyuan, Sarna, Neeraj, Lin, Yang

arXiv.org Artificial IntelligenceFeb-6-2025

Machine Learning models are being extensively used in safety critical applications where errors from these models could cause harm to the user. Such risks are amplified when multiple machine learning models, which are deployed concurrently, interact and make errors simultaneously. This paper explores three scenarios where error correlations between multiple models arise, resulting in such aggregated risks. Using real-world data, we simulate these scenarios and quantify the correlations in errors of different models. Our findings indicate that aggregated risks are substantial, particularly when models share similar algorithms, training datasets, or foundational models. Overall, we observe that correlations across models are pervasive and likely to intensify with increased reliance on foundational models and widely used public datasets, highlighting the need for effective mitigation strategies to address these challenges.

correlation, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2502.03937

Country:

North America > United States > California (0.05)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Connecticut (0.04)
(2 more...)

Genre: Research Report > New Finding (0.49)

Industry:

Health & Medicine (0.93)
Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

SyMANTIC: An Efficient Symbolic Regression Method for Interpretable and Parsimonious Model Discovery in Science and Beyond

Muthyala, Madhav R., Sorourifar, Farshud, Peng, You, Paulson, Joel A.

arXiv.org Artificial IntelligenceFeb-5-2025

Symbolic regression (SR) is an emerging branch of machine learning focused on discovering simple and interpretable mathematical expressions from data. Although a wide-variety of SR methods have been developed, they often face challenges such as high computational cost, poor scalability with respect to the number of input dimensions, fragility to noise, and an inability to balance accuracy and complexity. This work introduces SyMANTIC, a novel SR algorithm that addresses these challenges. SyMANTIC efficiently identifies (potentially several) low-dimensional descriptors from a large set of candidates (from $\sim 10^5$ to $\sim 10^{10}$ or more) through a unique combination of mutual information-based feature selection, adaptive feature expansion, and recursively applied $\ell_0$-based sparse regression. In addition, it employs an information-theoretic measure to produce an approximate set of Pareto-optimal equations, each offering the best-found accuracy for a given complexity. Furthermore, our open-source implementation of SyMANTIC, built on the PyTorch ecosystem, facilitates easy installation and GPU acceleration. We demonstrate the effectiveness of SyMANTIC across a range of problems, including synthetic examples, scientific benchmarks, real-world material property predictions, and chaotic dynamical system identification from small datasets. Extensive comparisons show that SyMANTIC uncovers similar or more accurate models at a fraction of the cost of existing SR methods.

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.03367

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine (1.00)
Energy > Energy Storage (1.00)
Materials > Chemicals (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.92)
(2 more...)

Add feedback

Type 2 Tobit Sample Selection Models with Bayesian Additive Regression Trees

O'Neill, Eoghan

arXiv.org Machine LearningFeb-5-2025

This paper introduces Type 2 Tobit Bayesian Additive Regression Trees (TOBART-2). BART can produce accurate individual-specific treatment effect estimates. However, in practice estimates are often biased by sample selection. We extend the Type 2 Tobit sample selection model to account for nonlinearities and model uncertainty by including sums of trees in both the selection and outcome equations. A Dirichlet Process Mixture distribution for the error terms allows for departure from the assumption of bivariate normally distributed errors. Soft trees and a Dirichlet prior on splitting probabilities improve modeling of smooth and sparse data generating processes. We include a simulation study and an application to the RAND Health Insurance Experiment data set.

artificial intelligence, bayesian inference, machine learning, (15 more...)

arXiv.org Machine Learning

2502.036

Country:

Europe > Netherlands > South Holland > Rotterdam (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Ohio (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Banking & Finance (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback