AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Diffusion Guided Language Modeling

Lovelace, Justin, Kishore, Varsha, Chen, Yiwei, Weinberger, Kilian Q.

arXiv.org Artificial IntelligenceAug-8-2024

Current language models demonstrate remarkable proficiency in text generation. However, for many applications it is desirable to control attributes, such as sentiment, or toxicity, of the generated language -- ideally tailored towards each specific use case and target audience. For auto-regressive language models, existing guidance methods are prone to decoding errors that cascade during generation and degrade performance. In contrast, text diffusion models can easily be guided with, for example, a simple linear sentiment classifier -- however they do suffer from significantly higher perplexity than auto-regressive alternatives. In this paper we use a guided diffusion model to produce a latent proposal that steers an auto-regressive language model to generate text with desired properties. Our model inherits the unmatched fluency of the auto-regressive approach and the plug-and-play flexibility of diffusion. We show that it outperforms previous plug-and-play guidance methods across a wide range of benchmark data sets. Further, controlling a new attribute in our framework is reduced to training a single logistic regression classifier.

continuation, dglm, diffusion model, (15 more...)

arXiv.org Artificial Intelligence

2408.0422

Country:

North America > United States > New York (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression

Chen, Xingwu, Zhao, Lei, Zou, Difan

arXiv.org Artificial IntelligenceAug-8-2024

Despite the remarkable success of transformer-based models in various real-world tasks, their underlying mechanisms remain poorly understood. Recent studies have suggested that transformers can implement gradient descent as an in-context learner for linear regression problems and have developed various theoretical analyses accordingly. However, these works mostly focus on the expressive power of transformers by designing specific parameter constructions, lacking a comprehensive understanding of their inherent working mechanisms post-training. In this study, we consider a sparse linear regression problem and investigate how a trained multi-head transformer performs in-context learning. We experimentally discover that the utilization of multi-heads exhibits different patterns across layers: multiple heads are utilized and essential in the first layer, while usually only a single head is sufficient for subsequent layers. We provide a theoretical explanation for this observation: the first layer preprocesses the context data, and the following layers execute simple optimization steps based on the preprocessed context. Moreover, we demonstrate that such a preprocess-then-optimize algorithm can significantly outperform naive gradient descent and ridge regression algorithms. Further experimental results support our explanations. Our findings offer insights into the benefits of multi-head attention and contribute to understanding the more intricate mechanisms hidden within trained transformers.

algorithm, probability, transformer, (11 more...)

arXiv.org Artificial Intelligence

2408.04532

Country:

Asia > China > Hong Kong (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

The Data Addition Dilemma

Shen, Judy Hanwen, Raji, Inioluwa Deborah, Chen, Irene Y.

arXiv.org Artificial IntelligenceAug-7-2024

In many machine learning for healthcare tasks, standard datasets are constructed by amassing data across many, often fundamentally dissimilar, sources. But when does adding more data help, and when does it hinder progress on desired model outcomes in real-world settings? We identify this situation as the \textit{Data Addition Dilemma}, demonstrating that adding training data in this multi-source scaling context can at times result in reduced overall accuracy, uncertain fairness outcomes, and reduced worst-subgroup performance. We find that this possibly arises from an empirically observed trade-off between model performance improvements due to data scaling and model deterioration from distribution shift. We thus establish baseline strategies for navigating this dilemma, introducing distribution shift heuristics to guide decision-making on which data sources to add in data scaling, in order to yield the expected model performance improvements. We conclude with a discussion of the required considerations for data collection and suggestions for studying data composition and scale in the age of increasingly larger models.

dataset, distribution shift, hospital, (12 more...)

arXiv.org Artificial Intelligence

2408.04154

Country:

North America > United States > South Dakota (0.04)
North America > United States > Louisiana (0.04)
North America > United States > Pennsylvania (0.04)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (0.67)
Health & Medicine > Therapeutic Area > Neurology (0.46)
Health & Medicine > Diagnostic Medicine > Imaging (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

On the Generalization for Transfer Learning: An Information-Theoretic Analysis

Wu, Xuetong, Manton, Jonathan H., Aickelin, Uwe, Zhu, Jingge

arXiv.org Artificial IntelligenceAug-7-2024

Transfer learning, or domain adaptation, is concerned with machine learning problems in which training and testing data come from possibly different probability distributions. In this work, we give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms. Our results suggest, perhaps as expected, that the Kullback-Leibler (KL) divergence $D(\mu\|\mu')$ plays an important role in the characterizations where $\mu$ and $\mu'$ denote the distribution of the training data and the testing data, respectively. Specifically, we provide generalization error and excess risk upper bounds for learning algorithms where data from both distributions are available in the training phase. Recognizing that the bounds could be sub-optimal in general, we provide improved excess risk upper bounds for a certain class of algorithms, including the empirical risk minimization (ERM) algorithm, by making stronger assumptions through the \textit{central condition}. To demonstrate the usefulness of the bounds, we further extend the analysis to the Gibbs algorithm and the noisy stochastic gradient descent method. We then generalize the mutual information bound with other divergences such as $\phi$-divergence and Wasserstein distance, which may lead to tighter bounds and can handle the case when $\mu$ is not absolutely continuous with respect to $\mu'$. Several numerical results are provided to demonstrate our theoretical findings. Lastly, to address the problem that the bounds are often not directly applicable in practice due to the absence of the distributional knowledge of the data, we develop an algorithm (called InfoBoost) that dynamically adjusts the importance weights for both source and target data based on certain information measures. The empirical results show the effectiveness of the proposed algorithm.

algorithm, generalization error, mutual information, (14 more...)

arXiv.org Artificial Intelligence

2207.05377

Country:

Oceania > Australia > Victoria > Melbourne (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education (1.00)
Government > Regional Government (0.67)
Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Sensitivity analysis using the Metamodel of Optimal Prognosis

Most, Thomas, Will, Johannes

arXiv.org Machine LearningAug-7-2024

In real case applications within the virtual prototyping process, it is not always possible to reduce the complexity of the physical models and to obtain numerical models which can be solved quickly. Usually, every single numerical simulation takes hours or even days. Although the progresses in numerical methods and high performance computing, in such cases, it is not possible to explore various model configurations, hence efficient surrogate models are required. Generally the available meta-model techniques show several advantages and disadvantages depending on the investigated problem. In this paper we present an automatic approach for the selection of the optimal suitable meta-model for the actual problem. Together with an automatic reduction of the variable space using advanced filter techniques an efficient approximation is enabled also for high dimensional problems. This filter techniques enable a reduction of the high dimensional variable space to a much smaller subspace where meta-model-based sensitivity analyses are carried out to assess the influence of important variables and to identify the optimal subspace with corresponding surrogate model which enables the most accurate probabilistic analysis. For this purpose we investigate variance-based and moment-free sensitivity measures in combination with advanced meta-models as moving least squares and kriging.

approximation, input variable, variation, (15 more...)

arXiv.org Machine Learning

2408.0359

Country:

Europe > Germany (0.05)
North America > United States > Massachusetts > Middlesex County > Natick (0.04)
Europe > United Kingdom > England (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Mathematics of Computing (0.68)

Add feedback

Feature Clock: High-Dimensional Effects in Two-Dimensional Plots

Ovcharenko, Olga, Sevastjanova, Rita, Boeva, Valentina

arXiv.org Artificial IntelligenceAug-6-2024

Humans struggle to perceive and interpret high-dimensional data. Therefore, high-dimensional data are often projected into two dimensions for visualization. Many applications benefit from complex nonlinear dimensionality reduction techniques, but the effects of individual high-dimensional features are hard to explain in the two-dimensional space. Most visualization solutions use multiple two-dimensional plots, each showing the effect of one high-dimensional feature in two dimensions; this approach creates a need for a visual inspection of k plots for a k-dimensional input space. Our solution, Feature Clock, provides a novel approach that eliminates the need to inspect these k plots to grasp the influence of original features on the data structure depicted in two dimensions. Feature Clock enhances the explainability and compactness of visualizations of embedded data and is available in an open-source Python library.

contribution, feature clock, visualization, (17 more...)

arXiv.org Artificial Intelligence

2408.01294

Country: Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Software (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Logistic Regression makes small LLMs strong and explainable "tens-of-shot" classifiers

Buckmann, Marcus, Hill, Edward

arXiv.org Machine LearningAug-6-2024

For simple classification tasks, we show that users can benefit from the advantages of using small, local, generative language models instead of large commercial models without a trade-off in performance or introducing extra labelling costs. These advantages, including those around privacy, availability, cost, and explainability, are important both in commercial applications and in the broader democratisation of AI. Through experiments on 17 sentence classification tasks (2-4 classes), we show that penalised logistic regression on the embeddings from a small LLM equals (and usually betters) the performance of a large LLM in the "tens-of-shot" regime. This requires no more labelled instances than are needed to validate the performance of the large LLM. Finally, we extract stable and sensible explanations for classification decisions.

accuracy 0, arxiv preprint arxiv, tweet, (13 more...)

arXiv.org Machine Learning

2408.03414

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England (0.04)
Europe > Poland (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Government (1.00)
Banking & Finance > Economy (1.00)
Information Technology (0.87)
Media (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.84)

Add feedback

Toward Smart Scheduling in Tapis

Stubbs, Joe, Padhy, Smruti, Cardone, Richard

arXiv.org Artificial IntelligenceAug-5-2024

The Tapis framework provides APIs for automating job execution on remote resources, including HPC clusters and servers running in the cloud. Tapis can simplify the interaction with remote cyberinfrastructure (CI), but the current services require users to specify the exact configuration of a job to run, including the system, queue, node count, and maximum run time, among other attributes. Moreover, the remote resources must be defined and configured in Tapis before a job can be submitted. In this paper, we present our efforts to develop an intelligent job scheduling capability in Tapis, where various attributes about a job configuration can be automatically determined for the user, and computational resources can be dynamically provisioned by Tapis for specific jobs. We develop an overall architecture for such a feature, which suggests a set of core challenges to be solved. Then, we focus on one such specific challenge: predicting queue times for a job on different HPC systems and queues, and we present two sets of results based on machine learning methods. Our first set of results cast the problem as a regression, which can be used to select the best system from a list of existing options. Our second set of results frames the problem as a classification, allowing us to compare the use of an existing system with a dynamically provisioned resource.

queue, queue time, tapis, (14 more...)

arXiv.org Artificial Intelligence

2408.03349

Country:

North America > United States > Texas > Travis County > Austin (0.15)
North America > United States > Texas > Shelby County > Center (0.05)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Adaptive Learning for Quantum Linear Regression

Carugno, Costantino, Dacrema, Maurizio Ferrari, Cremonesi, Paolo

arXiv.org Artificial IntelligenceAug-5-2024

The recent availability of quantum annealers as cloud-based services has enabled new ways to handle machine learning problems, and several relevant algorithms have been adapted to run on these devices. In a recent work, linear regression was formulated as a quadratic binary optimization problem that can be solved via quantum annealing. Although this approach promises a computational time advantage for large datasets, the quality of the solution is limited by the necessary use of a precision vector, used to approximate the real-numbered regression coefficients in the quantum formulation. In this work, we focus on the practical challenge of improving the precision vector encoding: instead of setting an array of generic values equal for all coefficients, we allow each one to be expressed by its specific precision, which is tuned with a simple adaptive algorithm. This approach is evaluated on synthetic datasets of increasing size, and linear regression is solved using the D-Wave Advantage quantum annealer, as well as classical solvers. To the best of our knowledge, this is the largest dataset ever evaluated for linear regression on a quantum annealer. The results show that our formulation is able to deliver improved solution quality in all instances, and could better exploit the potential of current quantum devices.

coefficient, dataset, precision vector, (13 more...)

arXiv.org Artificial Intelligence

2408.02833

Country:

Europe > Italy > Lombardy > Milan (0.04)
Europe > Finland (0.04)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Embedding generalization within the learning dynamics: An approach based-on sample path large deviation theory

Befekadu, Getachew K.

arXiv.org Machine LearningAug-4-2024

In this paper, we consider a typical learning problem of point estimations for modeling of nonlinear functions or dynamical systems in which generalization, i.e., verifying a given learned model or estimated parameters, can be embedded as an integral part of the learning process or dynamics. In particular, we consider an empirical risk minimization based learning problem that exploits gradient methods from continuous-time perspective with small random perturbations, which is guided by the training dataset loss. Here, we provide an asymptotic probability estimate in the small noise limit based-on the Freidlin-Wentzell theory of large deviations, when the sample path of the random process corresponding to the randomly perturbed gradient dynamical system hits a certain target set, i.e., a rare event, when the latter is specified by the testing dataset loss landscape. Interestingly, the proposed framework can be viewed as one way of improving generalization and robustness in learning problems that provides new insights leading to optimal point estimates which is guided by training data loss, while, at the same time, the learning dynamics has an access to the testing dataset loss landscape in some form of future achievable or anticipated target goal. Moreover, as a by-product, we establish a connection with optimal control problem, where the target set, i.e., the rare event, is considered as the desired outcome or achievable target goal for a certain optimal control problem, for which we also provide a verification result reinforcing the rationale behind the proposed framework. Finally, we present a computational algorithm - a two-step iterative numerical scheme - that solves the corresponding variational problem, i.e., a large deviation minimizer problem, leading to an optimal point estimates and, as part of this work, we also present some numerical results for a typical case of nonlinear regression problem.

dynamical system, generalization, testing dataset loss landscape, (9 more...)

arXiv.org Machine Learning

2408.02167

Country:

North America > United States > New York (0.04)
North America > United States > District of Columbia > Washington (0.04)
Europe > United Kingdom > England > Bristol (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry: Education > Focused Education > Special Education (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback