AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Data Augmentation for Mental Health Classification on Social Media

Ansari, Gunjan, Garg, Muskan, Saxena, Chandni

arXiv.org Artificial IntelligenceDec-19-2021

The mental disorder of online users is determined using social media posts. The major challenge in this domain is to avail the ethical clearance for using the user generated text on social media platforms. Academic re searchers identified the problem of insufficient and unlabeled data for mental health classification. To handle this issue, we have studied the effect of data augmentation techniques on domain specific user generated text for mental health classification. Among the existing well established data augmentation techniques, we have identified Easy Data Augmentation (EDA), conditional BERT, and Back Translation (BT) as the potential techniques for generating additional text to improve the performance of classifiers. Further, three different classifiers Random Forest (RF), Support Vector Machine (SVM) and Logistic Regression (LR) are employed for analyzing the impact of data augmentation on two publicly available social media datasets. The experiments mental results show significant improvements in classifiers performance when trained on the augmented data.

artificial intelligence, machine learning, social media, (13 more...)

arXiv.org Artificial Intelligence

2112.10064

Country:

Asia > India (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback

All the Statistical Tests You Must Do for a Good Linear Regression

#artificialintelligenceDec-18-2021, 04:36:15 GMT

The idea of this post is to show the many statistical tests that are around a Linear Regression. I know that it may sound repetitive ("Yet another post about Linear Regression"), but the information I am about to write about is not widely spread as we may think. Don't worry, I will leave the entire code at the end, where you will be able to see what I have imported for each test. As dataset, I will be using a "toy dataset" from sklearn about wines. For modeling and testing, I will use statsmodels, as it has all of the tests needed in the library.

dataset, good linear regression, linear regression, (15 more...)

#artificialintelligence

Genre: Research Report (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.96)

Add feedback

Off-Policy Evaluation Using Information Borrowing and Context-Based Switching

Dasgupta, Sutanoy, Niu, Yabo, Panaganti, Kishan, Kalathil, Dileep, Pati, Debdeep, Mallick, Bani

arXiv.org Machine LearningDec-18-2021

We consider the off-policy evaluation (OPE) problem in contextual bandits, where the goal is to estimate the value of a target policy using the data collected by a logging policy. Most popular approaches to the OPE are variants of the doubly robust (DR) estimator obtained by combining a direct method (DM) estimator and a correction term involving the inverse propensity score (IPS). Existing algorithms primarily focus on strategies to reduce the variance of the DR estimator arising from large IPS. We propose a new approach called the Doubly Robust with Information borrowing and Context-based switching (DR-IC) estimator that focuses on reducing both bias and variance. The DR-IC estimator replaces the standard DM estimator with a parametric reward model that borrows information from the 'closer' contexts through a correlation structure that depends on the IPS. The DR-IC estimator also adaptively interpolates between this modified DM estimator and a modified DR estimator based on a context-specific switching rule. We give provable guarantees on the performance of the DR-IC estimator. We also demonstrate the superior performance of the DR-IC estimator compared to the state-of-the-art OPE algorithms on a number of benchmark problems.

estimator, importance weight, reward model, (13 more...)

arXiv.org Machine Learning

2112.09865

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Explainable Deep Reinforcement Learning for Portfolio Management: An Empirical Approach

Guan, Mao, Liu, Xiao-Yang

arXiv.org Artificial IntelligenceDec-18-2021

Deep reinforcement learning (DRL) has been widely studied in the portfolio management task. However, it is challenging to understand a DRL-based trading strategy because of the black-box nature of deep neural networks. In this paper, we propose an empirical approach to explain the strategies of DRL agents for the portfolio management task. First, we use a linear model in hindsight as the reference model, which finds the best portfolio weights by assuming knowing actual stock returns in foresight. In particular, we use the coefficients of a linear model in hindsight as the reference feature weights. Secondly, for DRL agents, we use integrated gradients to define the feature weights, which are the coefficients between reward and features under a linear regression model. Thirdly, we study the prediction power in two cases, single-step prediction and multi-step prediction. In particular, we quantify the prediction power by calculating the linear correlations between the feature weights of a DRL agent and the reference feature weights, and similarly for machine learning methods. Finally, we evaluate a portfolio management task on Dow Jones 30 constituent stocks during 01/01/2009 to 09/01/2021. Our approach empirically reveals that a DRL agent exhibits a stronger multi-step prediction power than machine learning methods.

drl agent, feature weight, prediction power, (9 more...)

arXiv.org Artificial Intelligence

2111.03995

Country:

North America > United States > New York > New York County > New York City (0.46)
North America > United States > New York > Richmond County > New York City (0.14)
North America > United States > New York > Queens County > New York City (0.14)
(2 more...)

Genre: Research Report (0.50)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

TensorFlow - Hands-on Machine Learning with TensorFlow

#artificialintelligenceDec-17-2021, 10:18:52 GMT

Learn how to build Machine Learning projects in this TensorFlow Course created by The Click Reader. In this course, you will be learning about Scalar as well as Tensors and how to create them using TensorFlow. You will also be learning how to perform various kinds of Tensor operations for manipulating and changing tensor values.

hand-on machine learning, machine learning, tensorflow, (3 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.40)

Add feedback

Trend Following with Logistic Regression

#artificialintelligenceDec-17-2021, 00:31:11 GMT

In this post, we'll cover a pragmatic logistic regression classifier to mimic a trend following strategy for the S&P 500 ETF, SPY. The pipeline takes in daily prices for SPY along with several SPDR sector ETFs and macro ETFs for gold, Yen, Swiss Franc etc. Once all Open, High, Low, Close, and Volume data has been received from yfinance, a feature space (the set of columns if thinking in a spreadsheets world) is built using select indicators included in TA-lib. The features are then reduced to 4 n-components with Principal Component Analysis; the model is trained on these n principal components, using ground truth labels generated by a brute force optimized dual moving average crossover. Initially, I opted to use the default boundary of .5 for the binary classification. On visual inspection, there is a gap in this logic -- as the classifier appears exceedingly optimistic (subjective).

boundary, decision boundary, logistic regression, (6 more...)

#artificialintelligence

Genre:

Research Report > New Finding (0.62)
Research Report > Experimental Study (0.62)

Industry: Banking & Finance > Trading (0.75)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Sampling To Improve Predictions For Underrepresented Observations In Imbalanced Data

Kjærsgaard, Rune D., Grønberg, Manja G., Clemmensen, Line K. H.

arXiv.org Machine LearningDec-16-2021

Data imbalance is common in production data, where controlled production settings require data to fall within a narrow range of variation and data are collected with quality assessment in mind, rather than data analytic insights. This imbalance negatively impacts the predictive performance of models on underrepresented observations. We propose sampling to adjust for this imbalance with the goal of improving the performance of models trained on historical production data. We investigate the use of three sampling approaches to adjust for imbalance. The goal is to downsample the covariates in the training data and subsequently fit a regression model. We investigate how the predictive power of the model changes when using either the sampled or the original data for training. We apply our methods on a large biopharmaceutical manufacturing data set from an advanced simulation of penicillin production and find that fitting a model using the sampled data gives a small reduction in the overall predictive performance, but yields a systematically better performance on underrepresented observations. In addition, the results emphasize the need for alternative, fair, and balanced model evaluations.

nearest neighbour, training data, underrepresented observation, (15 more...)

arXiv.org Machine Learning

2111.09065

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Consumer adoption of telemedicine in 2021

#artificialintelligenceDec-14-2021, 07:50:24 GMT

Thank you to the Stanford Center of Digital Health for their continued collaboration on this work, with special gratitude to Natasha Din, MD, Clark Seninger, MBA, Sravya Rallapalli, Ashish Sarraju, MD, James Tooley, MD, Krishna Pundi, MD, Mario Funes-Hernandez, MD, and Mintu Turakhia, MD. Nearly two years into the COVID-19 pandemic, more consumers have used telemedicine than ever before. Venture investment in telemedicine is up, and big and small players are making land grabs for their share of the market, with many rolling out virtual–first care offerings. So with these accelerants--balanced with the full return of in-person care--what's the state of telemedicine? To answer this question and many more, we have surveyed U.S. adults every year since 2015 to check in with consumers and their relationship to digital health.

consumer adoption, modality, telemedicine, (7 more...)

#artificialintelligence

Industry: Health & Medicine > Health Care Technology > Telehealth (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.31)

Add feedback

Variable Selection and Regularization via Arbitrary Rectangle-range Generalized Elastic Net

Ding, Yujia, Peng, Qidi, Song, Zhengming, Chen, Hansen

arXiv.org Machine LearningDec-14-2021

We introduce the arbitrary rectangle-range generalized elastic net penalty method, abbreviated to ARGEN, for performing constrained variable selection and regularization in high-dimensional sparse linear models. As a natural extension of the nonnegative elastic net penalty method, ARGEN is proved to have variable selection consistency and estimation consistency under some conditions. The asymptotic behavior in distribution of the ARGEN estimators have been studied. We also propose an algorithm called MU-QP-RR-W-$l_1$ to efficiently solve ARGEN. By conducting simulation study we show that ARGEN outperforms the elastic net in a number of settings. Finally an application of S&P 500 index tracking with constraints on the stock allocations is performed to provide general guidance for adapting ARGEN to solve real-world problems.

argen, coefficient, consistency, (16 more...)

arXiv.org Machine Learning

2112.07785

Country:

North America > United States > California (0.04)
North America > United States > New York (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe (0.04)

Genre: Research Report (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Triangulation candidates for Bayesian optimization

Gramacy, Robert B., Sauer, Annie, Wycoff, Nathan

arXiv.org Machine LearningDec-14-2021

Bayesian optimization is a form of sequential design: idealize input-output relationships with a suitably flexible nonlinear regression model; fit to data from an initial experimental campaign; devise and optimize a criterion for selecting the next experimental condition(s) under the fitted model (e.g., via predictive equations) to target outcomes of interest (say minima); repeat after acquiring output under those conditions and updating the fit. In many situations this "inner optimization" over the new-data acquisition criterion is cumbersome because it is non-convex/highly multi-modal, may be non-differentiable, or may otherwise thwart numerical optimizers, especially when inference requires Monte Carlo. In such cases it is not uncommon to replace continuous search with a discrete one over random candidates. Here we propose using candidates based on a Delaunay triangulation of the existing input design. In addition to detailing construction of these "tricands", based on a simple wrapper around a conventional convex hull library, we promote several advantages based on properties of the geometric criterion involved. We then demonstrate empirically how tricands can lead to better Bayesian optimization performance compared to both numerically optimized acquisitions and random candidate-based alternatives on benchmark problems.

acquisition, optimization, tricand, (15 more...)

arXiv.org Machine Learning

2112.07457

Country:

North America > United States > Virginia (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Add feedback