AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Addressing Dynamic and Sparse Qualitative Data: A Hilbert Space Embedding of Categorical Variables

arXiv.org Artificial IntelligenceAug-22-2023

We propose a novel framework for incorporating qualitative data into quantitative models for causal estimation. Previous methods use categorical variables derived from qualitative data to build quantitative models. However, this approach can lead to data-sparse categories and yield inconsistent (asymptotically biased) and imprecise (finite sample biased) estimates if the qualitative information is dynamic and intricate. We use functional analysis to create a more nuanced and flexible framework. We embed the observed categories into a latent Baire space and introduce a continuous linear map -- a Hilbert space embedding -- from the Baire space of categories to a Reproducing Kernel Hilbert Space (RKHS) of representation functions. Through the Riesz representation theorem, we establish that the canonical treatment of categorical variables in causal models can be transformed into an identified structure in the RKHS. Transfer learning acts as a catalyst to streamline estimation -- embeddings from traditional models are paired with the kernel trick to form the Hilbert space embedding. We validate our model through comprehensive simulation evidence and demonstrate its relevance in a real-world study that contrasts theoretical predictions from economics and psychology in an e-commerce marketplace. The results confirm the superior performance of our model, particularly in scenarios where qualitative information is nuanced and complex.

category, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2308.11781

Country:

Asia > Singapore (0.04)
Europe (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Banking & Finance > Trading (0.67)
Media > Film (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

EM for Mixture of Linear Regression with Clustered Data

Reisizadeh, Amirhossein, Gatmiry, Khashayar, Ozdaglar, Asuman

arXiv.org Artificial IntelligenceAug-22-2023

Modern data-driven and distributed learning frameworks deal with diverse massive data generated by clients spread across heterogeneous environments. Indeed, data heterogeneity is a major bottleneck in scaling up many distributed learning paradigms. In many settings however, heterogeneous data may be generated in clusters with shared structures, as is the case in several applications such as federated learning where a common latent variable governs the distribution of all the samples generated by a client. It is therefore natural to ask how the underlying clustered structures in distributed data can be exploited to improve learning schemes. In this paper, we tackle this question in the special case of estimating $d$-dimensional parameters of a two-component mixture of linear regressions problem where each of $m$ nodes generates $n$ samples with a shared latent variable. We employ the well-known Expectation-Maximization (EM) method to estimate the maximum likelihood parameters from $m$ batches of dependent samples each containing $n$ measurements. Discarding the clustered structure in the mixture model, EM is known to require $O(\log(mn/d))$ iterations to reach the statistical accuracy of $O(\sqrt{d/(mn)})$. In contrast, we show that if initialized properly, EM on the structured data requires only $O(1)$ iterations to reach the same statistical accuracy, as long as $m$ grows up as $e^{o(n)}$. Our analysis establishes and combines novel asymptotic optimization and generalization guarantees for population and empirical EM with dependent samples, which may be of independent interest.

artificial intelligence, machine learning, probability, (18 more...)

arXiv.org Artificial Intelligence

2308.11518

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.72)

Add feedback

Econometrics of Machine Learning Methods in Economic Forecasting

Babii, Andrii, Ghysels, Eric, Striaukas, Jonas

arXiv.org Machine LearningAug-21-2023

This paper surveys the recent advances in machine learning method for economic forecasting. The survey covers the following topics: nowcasting, textual data, panel and tensor data, high-dimensional Granger causality tests, time series cross-validation, classification with economic losses.

econometrics, ghysel, regression, (15 more...)

arXiv.org Machine Learning

2308.10993

Country:

North America > United States > North Carolina > Orange County > Chapel Hill (0.14)
North America > United States > New York (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)
(3 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Banking & Finance > Economy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)
(2 more...)

Add feedback

Instance-based Learning with Prototype Reduction for Real-Time Proportional Myocontrol: A Randomized User Study Demonstrating Accuracy-preserving Data Reduction for Prosthetic Embedded Systems

Sziburis, Tim, Nowak, Markus, Brunelli, Davide

arXiv.org Artificial IntelligenceAug-21-2023

This work presents the design, implementation and validation of learning techniques based on the kNN scheme for gesture detection in prosthetic control. To cope with high computational demands in instance-based prediction, methods of dataset reduction are evaluated considering real-time determinism to allow for the reliable integration into battery-powered portable devices. The influence of parameterization and varying proportionality schemes is analyzed, utilizing an eight-channel-sEMG armband. Besides offline cross-validation accuracy, success rates in real-time pilot experiments (online target achievement tests) are determined. Based on the assessment of specific dataset reduction techniques' adequacy for embedded control applications regarding accuracy and timing behaviour, Decision Surface Mapping (DSM) proves itself promising when applying kNN on the reduced set. A randomized, double-blind user study was conducted to evaluate the respective methods (kNN and kNN with DSM-reduction) against Ridge Regression (RR) and RR with Random Fourier Features (RR-RFF). The kNN-based methods performed significantly better (p<0.0005) than the regression techniques. Between DSM-kNN and kNN, there was no statistically significant difference (significance level 0.05). This is remarkable in consideration of only one sample per class in the reduced set, thus yielding a reduction rate of over 99% while preserving success rate. The same behaviour could be confirmed in an extended user study. With k=1, which turned out to be an excellent choice, the runtime complexity of both kNN (in every prediction step) as well as DSM-kNN (in the training phase) becomes linear concerning the number of original samples, favouring dependable wearable prosthesis applications.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2308.11019

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.47)

Industry:

Health & Medicine > Health Care Technology (1.00)
Energy (1.00)
Health & Medicine > Therapeutic Area > Orthopedics/Orthopedic Surgery (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Architecture (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.82)
(2 more...)

Add feedback

Economic Policy Uncertainty: A Review on Applications and Measurement Methods with Focus on Text Mining Methods

Kaveh-Yazdy, Fatemeh, Zarifzadeh, Sajjad

arXiv.org Artificial IntelligenceAug-20-2023

Economic Policy Uncertainty (EPU) represents the uncertainty realized by the investors during economic policy alterations. EPU is a critical indicator in economic studies to predict future investments, the unemployment rate, and recessions. EPU values can be estimated based on financial parameters directly or implied uncertainty indirectly using the text mining methods. Although EPU is a well-studied topic within the economy, the methods utilized to measure it are understudied. In this article, we define the EPU briefly and review the methods used to measure the EPU, and survey the areas influenced by the changes in EPU level. We divide the EPU measurement methods into three major groups with respect to their input data. Examples of each group of methods are enlisted, and the pros and cons of the groups are discussed. Among the EPU measures, text mining-based ones are dominantly studied. These methods measure the realized uncertainty by taking into account the uncertainty represented in the news and publicly available sources of financial information. Finally, we survey the research areas that rely on measuring the EPU index with the hope that studying the impacts of uncertainty would attract further attention of researchers from various research fields. In addition, we propose a list of future research approaches focusing on measuring EPU using textual material.

economic policy uncertainty, economics, policy uncertainty, (13 more...)

arXiv.org Artificial Intelligence

2308.10304

Country:

North America > Mexico (0.14)
Asia > Macao (0.04)
Asia > Japan (0.04)
(25 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Banking & Finance > Economy (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Data Science > Data Mining > Text Mining (0.81)

Add feedback

Learning Rate Schedules in the Presence of Distribution Shift

Fahrbach, Matthew, Javanmard, Adel, Mirrokni, Vahab, Worah, Pratik

arXiv.org Artificial IntelligenceAug-20-2023

We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift and we give upper and lower bounds for the regret that only differ by constants. For non-convex loss functions, we define a notion of regret based on the gradient norm of the estimated models and propose a learning schedule that minimizes an upper bound on the total expected regret. Intuitively, one expects changing loss landscapes to require more exploration, and we confirm that optimal learning rate schedules typically increase in the presence of distribution shift. Finally, we provide experiments for high-dimensional regression models and neural networks to illustrate these learning rate schedules and their cumulative regret.

artificial intelligence, distribution shift, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2303.15634

Country:

North America > United States > California (0.14)
Asia > Middle East > Jordan (0.04)

Genre:

Workflow (0.68)
Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.68)
Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Add feedback

Global Warming In Ghana's Major Cities Based on Statistical Analysis of NASA's POWER Over 3-Decades

Attih, Joshua

arXiv.org Artificial IntelligenceAug-19-2023

Global warming's impact on high temperatures in various parts of the world has raised concerns. This study investigates long-term temperature trends in four major Ghanaian cities representing distinct climatic zones. Using NASA's Prediction of Worldwide Energy Resource (POWER) data, statistical analyses assess local climate warming and its implications. Linear regression trend analysis and eXtreme Gradient Boosting (XGBoost) machine learning predict temperature variations. Land Surface Temperature (LST) profile maps generated from the RSLab platform enhance accuracy. Results reveal local warming trends, particularly in industrialized Accra. Demographic factors aren't significant. XGBoost model's low Root Mean Square Error (RMSE) scores demonstrate effectiveness in capturing temperature patterns. Wa unexpectedly has the highest mean temperature. Estimated mean temperatures for mid-2023 are: Accra 27.86{\deg}C, Kumasi 27.15{\deg}C, Kete-Krachi 29.39{\deg}C, and Wa 30.76{\deg}C. These findings improve understanding of local climate warming for policymakers and communities, aiding climate change strategies.

artificial intelligence, machine learning, mean temperature, (18 more...)

arXiv.org Artificial Intelligence

2308.10909

Country:

Africa > Ghana > Greater Accra > Accra (0.48)
Africa > Ghana > Ashanti > Kumasi (0.28)
Africa > West Africa (0.04)
(9 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.48)

Industry:

Government > Space Agency (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Energy > Renewable (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

High Performance Computing Applied to Logistic Regression: A CPU and GPU Implementation Comparison

Mohammed, Nechba, Mohamed, Mouhajir, Yassine, Sedjari

arXiv.org Artificial IntelligenceAug-19-2023

We present a versatile GPU-based parallel version of Logistic Regression (LR), aiming to address the increasing demand for faster algorithms in binary classification due to large data sets. Our implementation is a direct translation of the parallel Gradient Descent Logistic Regression algorithm proposed by X. Zou et al. [12]. Our experiments demonstrate that our GPU-based LR outperforms existing CPU-based implementations in terms of execution time while maintaining comparable f1 score. The significant acceleration of processing large datasets makes our method particularly advantageous for real-time prediction applications like image recognition, spam detection, and fraud detection. Our algorithm is implemented in a ready-to-use Python library available at : https://github.com/NechbaMohammed/SwiftLogisticReg

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2308.10037

Country:

Africa > Middle East > Morocco > Rabat-Salé-Kénitra Region > Rabat (0.05)
Europe > Romania > Centru Development Region > Brașov County > Brașov (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Inductive-bias Learning: Generating Code Models with Large Language Model

Tanaka, Toma, Emoto, Naofumi, Yumibayashi, Tsukasa

arXiv.org Artificial IntelligenceAug-18-2023

Large Language Models(LLMs) have been attracting attention due to a ability called in-context learning(ICL). ICL, without updating the parameters of a LLM, it is possible to achieve highly accurate inference based on rules ``in the context'' by merely inputting a training data into the prompt. Although ICL is a developing field with many unanswered questions, LLMs themselves serves as a inference model, seemingly realizing inference without explicitly indicate ``inductive bias''. On the other hand, a code generation is also a highlighted application of LLMs. The accuracy of code generation has dramatically improved, enabling even non-engineers to generate code to perform the desired tasks by crafting appropriate prompts. In this paper, we propose a novel ``learning'' method called an ``Inductive-Bias Learning (IBL)'', which combines the techniques of ICL and code generation. An idea of IBL is straightforward. Like ICL, IBL inputs a training data into the prompt and outputs a code with a necessary structure for inference (we referred to as ``Code Model'') from a ``contextual understanding''. Despite being a seemingly simple approach, IBL encompasses both a ``property of inference without explicit inductive bias'' inherent in ICL and a ``readability and explainability'' of the code generation. Surprisingly, generated Code Models have been found to achieve predictive accuracy comparable to, and in some cases surpassing, ICL and representative machine learning models. Our IBL code is open source: https://github.com/fuyu-quant/IBLM

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2308.0989

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.30)

Add feedback

Spectral information criterion for automatic elbow detection

Martino, L., Millan-Castillo, R. San, Morgado, E.

arXiv.org Artificial IntelligenceAug-17-2023

We introduce a generalized information criterion that contains other well-known information criteria, such as Bayesian information Criterion (BIC) and Akaike information criterion (AIC), as special cases. Furthermore, the proposed spectral information criterion (SIC) is also more general than the other information criteria, e.g., since the knowledge of a likelihood function is not strictly required. SIC extracts geometric features of the error curve and, as a consequence, it can be considered an automatic elbow detector. SIC provides a subset of all possible models, with a cardinality that often is much smaller than the total number of possible models. The elements of this subset are elbows of the error curve. A practical rule for selecting a unique model within the sets of elbows is suggested as well. Theoretical invariance properties of SIC are analyzed. Moreover, we test SIC in ideal scenarios where provides always the optimal expected results. We also test SIC in several numerical experiments: some involving synthetic data, and two experiments involving real datasets. They are all real-world applications such as clustering, variable selection, or polynomial order selection, to name a few. The results show the benefits of the proposed scheme. Matlab code related to the experiments is also provided. Possible future research lines are finally discussed.

artificial intelligence, information criterion, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.eswa.2023.120705

2308.09108

Country:

Europe > Spain > Galicia > Madrid (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback