AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

When Your Regression Model's Errors Contain Two Peaks

#artificialintelligenceMay-30-2020, 16:29:15 GMT

It's good to see that all model coefficients are statistically significant at a p-value of 0.001 i.e. at 99.999% confidence level. As against linear regression models, models in which the dependent variable is a count, rarely produce normally distributed residual error distributions. So we have to normalize the raw-residuals using other means.

artificial intelligence, machine learning, regression model, (17 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.97)

Add feedback

How To Code Linear Regression From Scratch -- Quick & Easy!

#artificialintelligenceMay-30-2020, 01:58:39 GMT

Here, we load the chocolate data into our program using pandas; we also drop two of the columns we won't be using in our calculation: competitorname and winpercent. Our y becomes the first column in the dataset which indicates if our specific sweet is chocolate (1) or not (0). The remaining columns are used as variables/features to predict our y and, thus, become our X. If you're confused about why we're doing with …[:, 0][:,np.newaxis] on line 5, this is to turn y into a column. We simply add a new dimension to convert the horizontal vector into a vertical column!

artificial intelligence, code linear regression, machine learning, (2 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.40)

Add feedback

On lower bounds for the bias-variance trade-off

Derumigny, Alexis, Schmidt-Hieber, Johannes

arXiv.org Machine LearningMay-30-2020

It is a common phenomenon that for high-dimensional and nonparametric statistical models, rate-optimal estimators balance squared bias and variance. Although this balancing is widely observed, little is known whether methods exist that could avoid the trade-off between bias and variance. We propose a general strategy to obtain lower bounds on the variance of any estimator with bias smaller than a prespecified bound. This shows to which extent the bias-variance trade-off is unavoidable and allows to quantify the loss of performance for methods that do not obey it. The approach is based on a number of abstract lower bounds for the variance involving the change of expectation with respect to different probability measures as well as information measures such as the Kullback-Leibler or chi-square divergence. Some of these inequalities rely on a new concept of information matrices. In a second part of the article, the abstract lower bounds are applied to several statistical models including the Gaussian white noise model, a boundary estimation problem, the Gaussian sequence model and the high-dimensional linear regression model. For these specific statistical applications, different types of bias-variance trade-offs occur that vary considerably in their strength. For the trade-off between integrated squared bias and integrated variance in the Gaussian white noise model, we propose to combine the general strategy for lower bounds with a reduction technique. This allows us to reduce the original problem to a lower bound on the bias-variance trade-off for estimators with additional symmetry properties in a simpler statistical model. To highlight possible extensions of the proposed framework, we moreover briefly discuss the trade-off between bias and mean absolute deviation.

artificial intelligence, bias-variance tradeoff, machine learning, (14 more...)

arXiv.org Machine Learning

2006.00278

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
North America > Canada > Ontario > Toronto (0.14)
(7 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Add feedback

Detecting Problem Statements in Peer Assessments

Xiao, Yunkai, Zingle, Gabriel, Jia, Qinjin, Shah, Harsh R., Zhang, Yi, Li, Tianyi, Karovaliya, Mohsin, Zhao, Weixiang, Song, Yang, Ji, Jie, Balasubramaniam, Ashwin, Patel, Harshit, Bhalasubbramanian, Priyankha, Patel, Vikram, Gehringer, Edward F.

arXiv.org Machine LearningMay-29-2020

Effective peer assessment requires students to be attentive to the deficiencies in the work they rate. Thus, their reviews should identify problems. But what ways are there to check that they do? We attempt to automate the process of deciding whether a review comment detects a problem. We use over 18,000 review comments that were labeled by the reviewees as either detecting or not detecting a problem with the work. We deploy several traditional machine-learning models, as well as neural-network models using GloVe and BERT embeddings. We find that the best performer is the Hierarchical Attention Network classifier, followed by the Bidirectional Gated Recurrent Units (GRU) Attention and Capsule model with scores of 93.1% and 90.5% respectively. The best non-neural network model was the support vector machine with a score of 89.71%. This is followed by the Stochastic Gradient Descent model and the Logistic Regression model with 89.70% and 88.98%.

artificial intelligence, classifier, machine learning, (18 more...)

arXiv.org Machine Learning

2006.04532

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > North Carolina > Wake County > Raleigh (0.04)
(4 more...)

Genre: Research Report > New Finding (0.89)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.88)

Add feedback

Meta Clustering for Collaborative Learning

Ye, Chenglong, Ding, Jie, Ghanadan, Reza

arXiv.org Machine LearningMay-29-2020

An emerging number of learning scenarios involve a set of learners/analysts each equipped with a unique dataset and algorithm, who may collaborate with each other to enhance their learning performance. From the perspective of a particular learner, a careless collaboration with task-irrelevant other learners is likely to incur modeling error. A crucial problem is to search for the most appropriate collaborators so that their data and modeling resources can be effectively leveraged. Motivated by this, we propose to study the problem of'meta clustering', where the goal is to identify subsets of relevant learners whose collaboration will improve the performance of each individual learner. In particular, we study the scenario where each learner is performing a supervised regression, and the meta clustering aims to categorize the underlying supervised relations (between responses and predictors) instead of the raw data. We propose a general method named as Select-Exchange-Cluster (SEC) for performing such a clustering. Our method is computationally efficient as it does not require each learner to exchange their raw data. We prove that the SEC method can accurately cluster the learners into appropriate collaboration sets according to their underlying regression functions. Synthetic and real data examples show the desired performance and wide applicability of SEC to a variety of learning tasks. Index Terms Distributed computing; Fairness; Meta clustering; Regression.

data mining, learner, machine learning, (18 more...)

arXiv.org Machine Learning

2006.00082

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Minnesota (0.04)
North America > United States > Kentucky (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Education (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Machine learning time series regressions with an application to nowcasting

Babii, Andrii, Ghysels, Eric, Striaukas, Jonas

arXiv.org Machine LearningMay-28-2020

The statistical imprecision of quarterly gross domestic product (GDP) estimates, along with the fact that the first estimate is available with a delay of nearly a month, pose a significant challenge to policy makers, market participants, and other observers with an interest in monitoring the state of the economy in real time; see, e.g., Ghysels, Horan, and Moench (2018) for a recent discussion of macroeconomic data revision and publication delays. A term originated in meteorology, nowcasting pertains to the prediction of the present and very near future. Nowcasting is intrinsically a mixed frequency data problem as the object of interest is a low-frequency data series (e.g., quarterly GDP), whereas the real-time information (e.g., daily, weekly, or monthly) can be used to update the state, or to put it differently, to nowcast the low-frequency series of interest. Traditional methods used for nowcasting rely on dynamic factor models that treat the underlying low frequency series of interest as a latent process with high frequency data noisy observations. These models are naturally cast in a state-space form and inference can be performed using likelihood-based methods and Kalman filtering techniques; see Bańbura, Giannone, Modugno, and Reichlin (2013) for a recent survey.

machine learning, polynomial, real time system, (19 more...)

arXiv.org Machine Learning

2005.14057

Country:

North America > United States > New York (0.14)
North America > United States > North Carolina > Orange County > Chapel Hill (0.14)
Europe > Denmark (0.14)
(17 more...)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance > Trading (1.00)
Banking & Finance > Economy (1.00)

Technology:

Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

Breiman's "Two Cultures" Revisited and Reconciled

Subhadeep, null, Mukhopadhyay, null, Wang, Kaijun

arXiv.org Artificial IntelligenceMay-27-2020

In a landmark paper published in 2001, Leo Breiman described the tense standoff between two cultures of data modeling: parametric statistical and algorithmic machine learning. The cultural division between these two statistical learning frameworks has been growing at a steady pace in recent years. What is the way forward? It has become blatantly obvious that this widening gap between "the two cultures" cannot be averted unless we find a way to blend them into a coherent whole. This article presents a solution by establishing a link between the two cultures. Through examples, we describe the challenges and potential gains of this new integrated statistical thinking.

artificial intelligence, machine learning, pyq, (18 more...)

arXiv.org Artificial Intelligence

2005.13596

Country:

North America > United States > New York (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts (0.04)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.67)

Industry:

Government (1.00)
Education (0.67)
Health & Medicine > Therapeutic Area > Endocrinology (0.46)
(2 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Machine Learning for Beginners-Regression Analysis in Python #Udemy ($29.99 to Free) #Linux #IoT #programming #ArtificialIntelligence #ML #CloudComputing #BigData #DataScience #ReactJS #MachineLearning #RStats #100DaysOfCode #Serverless #Python #JavaScr

#artificialintelligenceMay-26-2020, 09:47:48 GMT

What are the meanings or different terms associated with machine learning? You will see some examples so that you understand what machine learning actually is. It also contains steps involved in building a machine learning model, not just linear models, any machine learning model.

artificial intelligence, machine learning, python, (11 more...)

#artificialintelligence

Genre:

Instructional Material > Course Syllabus & Notes (1.00)
Research Report (0.86)

Industry:

Education > Educational Setting > Online (0.65)
Education > Educational Technology > Educational Software > Computer Based Training (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Using Machine Learning to Forecast Future Earnings

Cui, Xinyue, Xu, Zhaoyu, Zhou, Yue

arXiv.org Machine LearningMay-26-2020

In this essay, we have comprehensively evaluated the feasibility and suitability of adopting the Machine Learning Models on the forecast of corporation fundamentals (i.e. the earnings), where the prediction results of our method have been thoroughly compared with both analysts' consensus estimation and traditional statistical models. As a result, our model has already been proved to be capable of serving as a favorable auxiliary tool for analysts to conduct better predictions on company fundamentals. Compared with previous traditional statistical models being widely adopted in the industry like Logistic Regression, our method has already achieved satisfactory advancement on both the prediction accuracy and speed. Meanwhile, we are also confident enough that there are still vast potentialities for this model to evolve, where we do hope that in the near future, the machine learning model could generate even better performances compared with professional analysts.

artificial intelligence, earnings, machine learning, (18 more...)

arXiv.org Machine Learning

2005.13995

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.49)
Research Report > Experimental Study (0.49)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

Review of Mathematical frameworks for Fairness in Machine Learning

del Barrio, Eustasio, Gordaliza, Paula, Loubes, Jean-Michel

arXiv.org Machine LearningMay-26-2020

With both the introduction of new ways of storing, sharing and streaming data and the drastic development of the capacity of computers to handle large computations, the conception of models have changed. Mathematical models were first designed following prior ideas or conjectures from physical or biological models, then tested by designing experiments to test the validity of the ideas of their inventors. The model holds until new observations enable to reject its assumptions. The so-called Big Data's area introduced a new paradigm. The observed data convey enough information to understand the complexity of real life and the more the data, the better the description of the reality. Hence building models optimised to fit the data has become an efficient way to obtain generalizable models able to describe and forecast the real world. In this framework, the principle of supervised machine learning is to build a decision rule from a set of labeled examples called the learning sample, that fits the data.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

2005.13755

Country:

North America > United States > California (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Europe > Spain > Castile and León > Valladolid Province > Valladolid (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.68)

Industry: Law (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
(2 more...)

Add feedback