AITopics

2407.0369

Country:

North America > United States (0.14)
North America > Cuba (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.89)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Holzmann, Hajo, Meister, Alexander

Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations

arXiv.org Machine LearningJul-11-2024

Expected values weighted by the inverse of a multivariate density or, equivalently, Lebesgue integrals of regression functions with multivariate regressors occur in various areas of applications, including estimating average treatment effects, nonparametric estimators in random coefficient regression models or deconvolution estimators in Berkson errors-in-variables models. The frequently used nearest-neighbor and matching estimators suffer from bias problems in multiple dimensions. By using polynomial least squares fits on each cell of the $K^{\text{th}}$-order Voronoi tessellation for sufficiently large $K$, we develop novel modifications of nearest-neighbor and matching estimators which again converge at the parametric $\sqrt n $-rate under mild smoothness assumptions on the unknown regression function and without any smoothness conditions on the unknown density of the covariates. We stress that in contrast to competing methods for correcting for the bias of matching estimators, our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent smoothing parameters. We complement the upper bounds with appropriate lower bounds derived from information-theoretic arguments, which show that some smoothness of the regression function is indeed required to achieve the parametric rate. Simulations illustrate the practical feasibility of the proposed methods.

estimator, exp, regression function, (15 more...)

2407.08494

Country: Europe > Germany (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.96)

arXiv.org Machine LearningJul-10-2024

How Inverse Conditional Flows Can Serve as a Substitute for Distributional Regression

Kook, Lucas, Kolb, Chris, Schiele, Philipp, Dold, Daniel, Arpogaus, Marcel, Fritz, Cornelius, Baumann, Philipp F., Kopper, Philipp, Pielok, Tobias, Dorigatti, Emilio, Rügamer, David

Neural network representations of simple models, such as linear regression, are being studied increasingly to better understand the underlying principles of deep learning algorithms. However, neural representations of distributional regression models, such as the Cox model, have received little attention so far. We close this gap by proposing a framework for distributional regression using inverse flow transformations (DRIFT), which includes neural representations of the aforementioned models. We empirically demonstrate that the neural representations of models in DRIFT can serve as a substitute for their classical statistical counterparts in several applications involving continuous, ordered, time-series, and survival outcomes. We confirm that models in DRIFT empirically match the performance of several statistical methods in terms of estimation of partial effects, prediction, and aleatoric uncertainty quantification. DRIFT covers both interpretable statistical models and flexible neural networks opening up new avenues in both statistical modeling and deep learning.

assumption, dataset, neural network, (15 more...)

2405.05429

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Pennsylvania (0.04)
(6 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Mostowsky, Peter, Dutordoir, Vincent, Azangulov, Iskander, Jaquier, Noémie, Hutchinson, Michael John, Ravuri, Aditya, Rozo, Leonel, Terenin, Alexander, Borovitskiy, Viacheslav

The GeometricKernels Package: Heat and Mat\'ern Kernels for Geometric Learning on Manifolds, Meshes, and Graphs

arXiv.org Machine LearningJul-10-2024

Kernels are a fundamental technical primitive in machine learning. In recent years, kernel-based methods such as Gaussian processes are becoming increasingly important in applications where quantifying uncertainty is of key interest. In settings that involve structured data defined on graphs, meshes, manifolds, or other related spaces, defining kernels with good uncertainty-quantification behavior, and computing their value numerically, is less straightforward than in the Euclidean setting. To address this difficulty, we present GeometricKernels, a software package which implements the geometric analogs of classical Euclidean squared exponential - also known as heat - and Mat\'ern kernels, which are widely-used in settings where uncertainty is of key interest. As a byproduct, we obtain the ability to compute Fourier-feature-type expansions, which are widely used in their own right, on a wide set of geometric spaces. Our implementation supports automatic differentiation in every major current framework simultaneously via a backend-agnostic design. In this companion paper to the package and its documentation, we outline the capabilities of the package and present an illustrated example of its interface. We also include a brief overview of the theory the package is built upon and provide some historic context in the appendix.

gaussian process, kernel, manifold, (13 more...)

2407.08086

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre:

Overview (0.66)
Research Report (0.50)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Jabbar, Abdul, Jalil, Syed Qaisar

A Comprehensive Analysis of Machine Learning Models for Algorithmic Trading of Bitcoin

arXiv.org Artificial IntelligenceJul-9-2024

This study evaluates the performance of 41 machine learning models, including 21 classifiers and 20 regressors, in predicting Bitcoin prices for algorithmic trading. By examining these models under various market conditions, we highlight their accuracy, robustness, and adaptability to the volatile cryptocurrency market. Our comprehensive analysis reveals the strengths and limitations of each model, providing critical insights for developing effective trading strategies. We employ both machine learning metrics (e.g., Mean Absolute Error, Root Mean Squared Error) and trading metrics (e.g., Profit and Loss percentage, Sharpe Ratio) to assess model performance. Our evaluation includes backtesting on historical data, forward testing on recent unseen data, and real-world trading scenarios, ensuring the robustness and practical applicability of our models. Key findings demonstrate that certain models, such as Random Forest and Stochastic Gradient Descent, outperform others in terms of profit and risk management. These insights offer valuable guidance for traders and researchers aiming to leverage machine learning for cryptocurrency trading.

dataset, market condition, prediction, (15 more...)

2407.18334

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
North America > United States > Hawaii (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

arXiv.org Artificial IntelligenceJul-9-2024

Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

Chuang, Yung-Sung, Qiu, Linlu, Hsieh, Cheng-Yu, Krishna, Ranjay, Kim, Yoon, Glass, James

When asked to summarize articles or answer questions given a passage, large language models (LLMs) can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple approach for detecting such contextual hallucinations. We hypothesize that contextual hallucinations are related to the extent to which an LLM attends to information in the provided context versus its own generations. Based on this intuition, we propose a simple hallucination detection model whose input features are given by the ratio of attention weights on the context versus newly generated tokens (for each attention head). We find that a linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model. The lookback ratio-based detector -- Lookback Lens -- is found to transfer across tasks and even models, allowing a detector that is trained on a 7B model to be applied (without retraining) to a larger 13B model. We further apply this detector to mitigate contextual hallucinations, and find that a simple classifier-guided decoding approach is able to reduce the amount of hallucination, for example by 9.6% in the XSum summarization task.

classifier, hallucination, lookback lens, (14 more...)

2407.07071

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Cloos, Nathan, Li, Moufan, Siegel, Markus, Brincat, Scott L., Miller, Earl K., Yang, Guangyu Robert, Cueva, Christopher J.

Differentiable Optimization of Similarity Scores Between Models and Brains

arXiv.org Artificial IntelligenceJul-9-2024

What metrics should guide the development of more realistic models of the brain? One proposal is to quantify the similarity between models and brains using methods such as linear regression, Centered Kernel Alignment (CKA), and angular Procrustes distance. To better understand the limitations of these similarity measures we analyze neural activity recorded in five experiments on nonhuman primates, and optimize synthetic datasets to become more similar to these neural recordings. How similar can these synthetic datasets be to neural activity while failing to encode task relevant variables? We find that some measures like linear regression and CKA, differ from angular Procrustes, and yield high similarity scores even when task relevant variables cannot be linearly decoded from the synthetic datasets. Synthetic datasets optimized to maximize similarity scores initially learn the first principal component of the target dataset, but angular Procrustes captures higher variance dimensions much earlier than methods like linear regression and CKA. We show in both theory and simulations how these scores change when different principal components are perturbed. And finally, we jointly optimize multiple similarity scores to find their allowed ranges, and show that a high angular Procrustes similarity, for example, implies a high CKA score, but not the converse.

dataset, principal component, similarity score, (14 more...)

2407.07059

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > France (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.77)

arXiv.org Machine LearningJul-9-2024

Distributionally robust risk evaluation with an isotonic constraint

Gui, Yu, Barber, Rina Foygel, Ma, Cong

Statistical learning under distribution shift is challenging when neither prior knowledge nor fully accessible data from the target distribution is available. Distributionally robust learning (DRL) aims to control the worst-case statistical performance within an uncertainty set of candidate distributions, but how to properly specify the set remains challenging. To enable distributional robustness without being overly conservative, in this paper, we propose a shape-constrained approach to DRL, which incorporates prior information about the way in which the unknown target distribution differs from its estimate. More specifically, we assume the unknown density ratio between the target distribution and its estimate is isotonic with respect to some partial order. At the population level, we provide a solution to the shape-constrained optimization problem that does not involve the isotonic constraint. At the sample level, we provide consistency results for an empirical estimator of the target in a range of different settings. Empirical studies on both synthetic and real data examples demonstrate the improved accuracy of the proposed shape-constrained approach.

constraint, distribution shift, iso, (15 more...)

2407.06867

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.45)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Mitra, Pallavi, Biessmann, Felix

Automated Computational Energy Minimization of ML Algorithms using Constrained Bayesian Optimization

arXiv.org Artificial IntelligenceJul-8-2024

Bayesian optimization (BO) is an efficient framework for optimization of black-box objectives when function evaluations are costly and gradient information is not easily accessible. BO has been successfully applied to automate the task of hyperparameter optimization (HPO) in machine learning (ML) models with the primary objective of optimizing predictive performance on held-out data. In recent years, however, with ever-growing model sizes, the energy cost associated with model training has become an important factor for ML applications. Here we evaluate Constrained Bayesian Optimization (CBO) with the primary objective of minimizing energy consumption and subject to the constraint that the generalization performance is above some threshold. We evaluate our approach on regression and classification tasks and demonstrate that CBO achieves lower energy consumption without compromising the predictive performance of ML models.

energy consumption, hyperparameter, unconstrained bo, (12 more...)

2407.05788

Country:

North America > United States > California (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report > New Finding (0.69)

Industry: Energy (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.31)

arXiv.org Machine LearningJul-8-2024

High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global Updates

Lu, Fred, Curtin, Ryan R., Raff, Edward, Ferraro, Francis, Holt, James

As the size of datasets used in statistical learning continues to grow, distributed training of models has attracted increasing attention. These methods partition the data and exploit parallelism to reduce memory and runtime, but suffer increasingly from communication costs as the data size or the number of iterations grows. Recent work on linear models has shown that a surrogate likelihood can be optimized locally to iteratively improve on an initial solution in a communication-efficient manner. However, existing versions of these methods experience multiple shortcomings as the data size becomes massive, including diverging updates and efficiently handling sparsity. In this work we develop solutions to these problems which enable us to learn a communication-efficient distributed logistic regression model even beyond millions of features. In our experiments we demonstrate a large improvement in accuracy over distributed algorithms with only a few distributed update steps needed, and similar or faster runtimes. Our code is available at \url{https://github.com/FutureComputing4AI/ProxCSL}.

dataset, objective, proxcsl, (13 more...)

doi: 10.1145/3637528.3672038

2407.06346

Country:

North America > United States > Maryland > Baltimore County (0.14)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
Asia > Middle East > Jordan (0.05)
(7 more...)

Genre: Research Report > New Finding (0.56)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.56)