AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

The ART of Transfer Learning: An Adaptive and Robust Pipeline

arXiv.org Artificial IntelligenceApr-30-2023

Transfer learning is an essential tool for improving the performance of primary tasks by leveraging information from auxiliary data resources. In this work, we propose Adaptive Robust Transfer Learning (ART), a flexible pipeline of performing transfer learning with generic machine learning algorithms. We establish the non-asymptotic learning theory of ART, providing a provable theoretical guarantee for achieving adaptive transfer while preventing negative transfer. Additionally, we introduce an ART-integrated-aggregating machine that produces a single final model when multiple candidate algorithms are considered. We demonstrate the promising performance of ART through extensive empirical studies on regression, classification, and sparse learning. We further present a real-data analysis for a mortality study.

artificial intelligence, auxiliary data, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2305.0052

Country: North America > United States (1.00)

Genre: Research Report > Experimental Study (0.68)

Industry:

Health & Medicine (1.00)
Energy > Oil & Gas > Midstream (0.97)
Materials > Chemicals > Industrial Gases > Liquified Gas (0.72)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.72)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Predictability of Machine Learning Algorithms and Related Feature Extraction Techniques

Dong, Yunbo

arXiv.org Artificial IntelligenceApr-30-2023

To implement machine learning, it is essential to first determine an appropriate algorithm for the dataset. Different algorithms may produce a large number of different models with different hyperparameter configurations, and it usually takes a lot of time to run the model on a large dataset when the model is relatively complex. Therefore, how to predict the performance of a model on a dataset is an fundamental problem to be solved. This thesis designs a prediction system based on matrix factorization to predict the classification accuracy of a specific model on a particular dataset. In this thesis, we conduct a comprehensive empirical research on more than fifty datasets that we collected from the openml web site.

artificial intelligence, dataset, machine learning algorithm, (11 more...)

arXiv.org Artificial Intelligence

2305.00449

Country:

North America > United States > New York (0.04)
North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)
(3 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Leisure & Entertainment (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.99)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.48)
(2 more...)

Add feedback

EBLIME: Enhanced Bayesian Local Interpretable Model-agnostic Explanations

Zhong, Yuhao, Bhattacharya, Anirban, Bukkapatnam, Satish

arXiv.org Artificial IntelligenceApr-29-2023

We propose EBLIME to explain black-box machine learning models and obtain the distribution of feature importance using Bayesian ridge regression models. We provide mathematical expressions of the Bayesian framework and theoretical outcomes including the significance of ridge parameter. Case studies were conducted on benchmark datasets and a real-world industrial application of locating internal defects in manufactured products. Compared to the state-of-the-art methods, EBLIME yields more intuitive and accurate results, with better uncertainty quantification in terms of deriving the posterior distribution, credible intervals, and rankings of the feature importance.

artificial intelligence, eblime, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2305.00213

Country: North America > United States > Texas > Brazos County > College Station (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Data-Driven Subgroup Identification for Linear Regression

Izzo, Zachary, Liu, Ruishan, Zou, James

arXiv.org Artificial IntelligenceApr-29-2023

Medical studies frequently require to extract the relationship between each covariate and the outcome with statistical confidence measures. To do this, simple parametric models are frequently used (e.g. coefficients of linear regression) but usually fitted on the whole dataset. However, it is common that the covariates may not have a uniform effect over the whole population and thus a unified simple model can miss the heterogeneous signal. For example, a linear model may be able to explain a subset of the data but fail on the rest due to the nonlinearity and heterogeneity in the data. In this paper, we propose DDGroup (data-driven group discovery), a data-driven method to effectively identify subgroups in the data with a uniform linear relationship between the features and the label. DDGroup outputs an interpretable region in which the linear model is expected to hold. It is simple to implement and computationally tractable for use. We show theoretically that, given a large enough sample, DDGroup recovers a region where a single linear model with low variance is well-specified (if one exists), and experiments on real-world medical datasets confirm that it can discover regions where a local linear model has improved performance. Our experiments also show that DDGroup can uncover subgroups with qualitatively different relationships which are missed by simply applying parametric approaches to the whole dataset.

artificial intelligence, machine learning, probability, (13 more...)

arXiv.org Artificial Intelligence

2305.00195

Country:

Asia > China (0.05)
Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.94)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.87)

Add feedback

Sparse Private LASSO Logistic Regression

Khanna, Amol, Lu, Fred, Raff, Edward, Testa, Brian

arXiv.org Artificial IntelligenceApr-28-2023

LASSO regularized logistic regression is particularly useful for its built-in feature selection, allowing coefficients to be removed from deployment and producing sparse solutions. Differentially private versions of LASSO logistic regression have been developed, but generally produce dense solutions, reducing the intrinsic utility of the LASSO penalty. In this paper, we present a differentially private method for sparse logistic regression that maintains hard zeros. Our key insight is to first train a non-private LASSO logistic regression model to determine an appropriate privatized number of non-zero coefficients to use in final model selection. To demonstrate our method's performance, we run experiments on synthetic and real-world datasets.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2304.12429

Country:

North America > United States > Maryland > Baltimore County (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > Maryland > Anne Arundel County > Annapolis (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Enhancing Supply Chain Resilience: A Machine Learning Approach for Predicting Product Availability Dates Under Disruption

Camur, Mustafa Can, Ravi, Sandipp Krishnan, Saleh, Shadi

arXiv.org Artificial IntelligenceApr-28-2023

The COVID 19 pandemic and ongoing political and regional conflicts have a highly detrimental impact on the global supply chain, causing significant delays in logistics operations and international shipments. One of the most pressing concerns is the uncertainty surrounding the availability dates of products, which is critical information for companies to generate effective logistics and shipment plans. Therefore, accurately predicting availability dates plays a pivotal role in executing successful logistics operations, ultimately minimizing total transportation and inventory costs. We investigate the prediction of product availability dates for General Electric (GE) Gas Power's inbound shipments for gas and steam turbine service and manufacturing operations, utilizing both numerical and categorical features. We evaluate several regression models, including Simple Regression, Lasso Regression, Ridge Regression, Elastic Net, Random Forest (RF), Gradient Boosting Machine (GBM), and Neural Network models. Based on real world data, our experiments demonstrate that the tree based algorithms (i.e., RF and GBM) provide the best generalization error and outperforms all other regression models tested. We anticipate that our prediction models will assist companies in managing supply chain disruptions and reducing supply chain risks on a broader scale.

artificial intelligence, availability date, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2304.14902

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Mississippi (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Freight & Logistics Services (1.00)
Energy (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.92)

Add feedback

NAP at SemEval-2023 Task 3: Is Less Really More? (Back-)Translation as Data Augmentation Strategies for Detecting Persuasion Techniques

Falk, Neele, Eichel, Annerose, Piccirilli, Prisca

arXiv.org Artificial IntelligenceApr-27-2023

Persuasion techniques detection in news in a multi-lingual setup is non-trivial and comes with challenges, including little training data. Our system successfully leverages (back-)translation as data augmentation strategies with multi-lingual transformer models for the task of detecting persuasion techniques. The automatic and human evaluation of our augmented data allows us to explore whether (back-)translation aid or hinder performance. Our in-depth analyses indicate that both data augmentation strategies boost performance; however, balancing human-produced and machine-generated data seems to be crucial.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2304.14179

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
Asia > China > Hong Kong (0.04)
(6 more...)

Genre: Research Report > New Finding (0.93)

Industry: Media (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)

Add feedback

Lady and the Tramp Nextdoor: Online Manifestations of Economic Inequalities in the Nextdoor Social Network

Iqbal, Waleed, Ghafouri, Vahid, Tyson, Gareth, Suarez-Tangil, Guillermo, Castro, Ignacio

arXiv.org Artificial IntelligenceApr-26-2023

From health to education, income impacts a huge range of life choices. Earlier research has leveraged data from online social networks to study precisely this impact. In this paper, we ask the opposite question: do different levels of income result in different online behaviors? We demonstrate it does. We present the first large-scale study of Nextdoor, a popular location-based social network. We collect 2.6 Million posts from 64,283 neighborhoods in the United States and 3,325 neighborhoods in the United Kingdom, to examine whether online discourse reflects the income and income inequality of a neighborhood. We show that posts from neighborhoods with different incomes indeed differ, e.g. richer neighborhoods have a more positive sentiment and discuss crimes more, even though their actual crime rates are much lower. We then show that user-generated content can predict both income and inequality. We train multiple machine learning models and predict both income (R-squared=0.841) and inequality (R-squared=0.77).

artificial intelligence, machine learning, neighborhood, (19 more...)

arXiv.org Artificial Intelligence

2304.05232

Country:

Europe > United Kingdom > Wales (0.04)
Europe > Spain > Galicia > Madrid (0.04)
North America > United States > Wisconsin (0.04)
(18 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Services (0.91)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

FairBalance: How to Achieve Equalized Odds With Data Pre-processing

Yu, Zhe, Chakraborty, Joymallya, Menzies, Tim

arXiv.org Artificial IntelligenceApr-26-2023

This research seeks to benefit the software engineering society by providing a simple yet effective pre-processing approach to achieve equalized odds fairness in machine learning software. Fairness issues have attracted increasing attention since machine learning software is increasingly used for high-stakes and high-risk decisions. Amongst all the existing fairness notions, this work specifically targets "equalized odds" given its advantage in always allowing perfect classifiers. Equalized odds requires that members of every demographic group do not receive disparate mistreatment. Prior works either optimize for an equalized odds related metric during the learning process like a black-box, or manipulate the training data following some intuition. This work studies the root cause of the violation of equalized odds and how to tackle it. We found that equalizing the class distribution in each demographic group with sample weights is a necessary condition for achieving equalized odds without modifying the normal training process. In addition, an important partial condition for equalized odds (zero average odds difference) can be guaranteed when the class distributions are weighted to be not only equal but also balanced (1:1). Based on these analyses, we proposed FairBalance, a pre-processing algorithm which balances the class distribution in each demographic group by assigning calculated weights to the training data. On eight real-world datasets, our empirical results show that, at low computational overhead, the proposed pre-processing algorithm FairBalance can significantly improve equalized odds without much, if any damage to the utility. FairBalance also outperforms existing state-of-the-art approaches in terms of equalized odds. To facilitate reuse, reproduction, and validation, we made our scripts available at https://github.com/hil-se/FairBalance.

artificial intelligence, equalized odds, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2107.0831

Country:

North America > United States > North Carolina (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.88)

Industry:

Education (0.54)
Health & Medicine (0.46)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

A Comparative Analysis of Multiple Methods for Predicting a Specific Type of Crime in the City of Chicago

Djon, Deborah, Jhawar, Jitesh, Drumm, Kieron, Tran, Vincent

arXiv.org Artificial IntelligenceApr-26-2023

Researchers regard crime as a social phenomenon that is influenced by several physical, social, and economic factors. Different types of crimes are said to have different motivations. Theft, for instance, is a crime that is based on opportunity, whereas murder is driven by emotion. In accordance with this, we examine how well a model can perform with only spatiotemporal information at hand when it comes to predicting a single crime. More specifically, we aim at predicting theft, as this is a crime that should be predictable using spatiotemporal information. We aim to answer the question: "How well can we predict theft using spatial and temporal features?". To answer this question, we examine the effectiveness of support vector machines, linear regression, XGBoost, Random Forest, and k-nearest neighbours, using different imbalanced techniques and hyperparameters. XGBoost showed the best results with an F1-score of 0.86.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2304.13464

Country:

North America > United States > Illinois > Cook County > Chicago (0.42)
North America > United States > California > San Francisco County > San Francisco (0.05)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(2 more...)

Genre: Research Report > New Finding (0.70)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback