AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Active Linear Regression

Fontaine, Xavier, Perrault, Pierre, Perchet, Vianney

arXiv.org Machine LearningJun-20-2019

We consider the problem of active linear regression where a decision maker has to choose between several covariates to sample in order to obtain the best estimate $\hat{\beta}$ of the parameter $\beta^{\star}$ of the linear model, in the sense of minimizing $\mathbb{E} \lVert\hat{\beta}-\beta^{\star}\rVert^2$. Using bandit and convex optimization techniques we propose an algorithm to define the sampling strategy of the decision maker and we compare it with other algorithms. We provide theoretical guarantees of our algorithm in different settings, including a $\mathcal{O}(T^{-2})$ regret bound in the case where the covariates form a basis of the feature space, generalizing and improving existing results. Numerical experiments validate our theoretical findings.

active linear regression, artificial intelligence, machine learning

arXiv.org Machine Learning

1906.08509

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.60)

Add feedback

Bayesian inverse regression for supervised dimension reduction with small datasets

Cai, Xin, Lin, Guang, Li, Jinglai

arXiv.org Machine LearningJun-19-2019

We consider supervised dimension reduction problems, namely to identify a low dimensional projection of the predictors $\-x$ which can retain the statistical relationship between $\-x$ and the response variable $y$. We follow the idea of the sliced inverse regression (SIR) class of methods, which is to use the statistical information of the conditional distribution $\pi(\-x|y)$ to identify the dimension reduction (DR) space and in particular we focus on the task of computing this conditional distribution. We propose a Bayesian framework to compute the conditional distribution where the likelihood function is obtained using the Gaussian process regression model. The conditional distribution $\pi(\-x|y)$ can then be obtained directly by assigning weights to the original data points. We then can perform DR by considering certain moment functions (e.g. the first moment) of the samples of the posterior distribution. With numerical examples, we demonstrate that the proposed method is especially effective for small data problems.

artificial intelligence, machine learning, regression, (18 more...)

arXiv.org Machine Learning

1906.08018

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

Data-Driven Malaria Prevalence Prediction in Large Densely-Populated Urban Holoendemic sub-Saharan West Africa: Harnessing Machine Learning Approaches and 22-years of Prospectively Collected Data

Brown, Biobele J., Przybylski, Alexander A., Manescu, Petru, Caccioli, Fabio, Oyinloye, Gbeminiyi, Elmi, Muna, Shaw, Michael J., Pawar, Vijay, Claveau, Remy, Shawe-Taylor, John, Srinivasan, Mandayam A., Afolabi, Nathaniel K., Orimadegun, Adebola E., Ajetunmobi, Wasiu A., Akinkunmi, Francis, Kowobari, Olayinka, Osinusi, Kikelomo, Akinbami, Felix O., Omokhodion, Samuel, Shokunbi, Wuraola A., Lagunju, Ikeoluwa, Sodeinde, Olugbemiro, Fernandez-Reyes, Delmiro

arXiv.org Machine LearningJun-18-2019

Plasmodium falciparum malaria still poses one of the greatest threats to human life with over 200 million cases globally leading to half-million deaths annually. Of these, 90% of cases and of the mortality occurs in sub-Saharan Africa, mostly among children. Although malaria prediction systems are central to the 2016-2030 malaria Global Technical Strategy, currently these are inadequate at capturing and estimating the burden of disease in highly endemic countries. We developed and validated a computational system that exploits the predictive power of current Machine Learning approaches on 22-years of prospective data from the high-transmission holoendemic malaria urban-densely-populated sub-Saharan West-Africa metropolis of Ibadan. Our dataset of >9x104 screened study participants attending our clinical and community services from 1996 to 2017 contains monthly prevalence, temporal, environmental and host features. Our Locality-specific Elastic-Net based Malaria Prediction System (LEMPS) achieves good generalization performance, both in magnitude and direction of the prediction, when tasked to predict monthly prevalence on previously unseen validation data (MAE<=6x10-2, MSE<=7x10-3) within a range of (+0.1 to -0.05) error-tolerance which is relevant and usable for aiding decision-support in a holoendemic setting. LEMPS is well-suited for malaria prediction, where there are multiple features which are correlated with one another, and trading-off between regularization-strength L1-norm and L2-norm allows the system to retain stability. Data-driven systems are critical for regionally-adaptable surveillance, management of control strategies and resource allocation across stretched healthcare systems.

artificial intelligence, machine learning, prevalence, (17 more...)

arXiv.org Machine Learning

1906.07502

Country:

Africa > West Africa (0.61)
Africa > Nigeria > Oyo State > Ibadan (0.30)
Africa > Sub-Saharan Africa (0.24)
(2 more...)

Genre: Research Report > Experimental Study (0.88)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Model selection for high-dimensional linear regression with dependent observations

Ing, Ching-Kang

arXiv.org Machine LearningJun-18-2019

We investigate the prediction capability of the orthogonal greedy algorithm (OGA) in high-dimensional regression models with dependent observations. The rates of convergence of the prediction error of OGA are obtained under a variety of sparsity conditions. To prevent OGA from overfitting, we introduce a high-dimensional Akaike's information criterion (HDAIC) to determine the number of OGA iterations. A key contribution of this work is to show that OGA, used in conjunction with HDAIC, can achieve the optimal convergence rate without knowledge of how sparse the underlying high-dimensional model is.

artificial intelligence, logp, machine learning, (19 more...)

arXiv.org Machine Learning

1906.07395

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Exact and Consistent Interpretation of Piecewise Linear Models Hidden behind APIs: A Closed Form Solution

Cong, Zicun, Chu, Lingyang, Wang, Lanjun, Hu, Xia, Pei, Jian

arXiv.org Machine LearningJun-17-2019

More and more AI services are provided through APIs on cloud where predictive models are hidden behind APIs. To build trust with users and reduce potential application risk, it is important to interpret how such predictive models hidden behind APIs make their decisions. The biggest challenge of interpreting such predictions is that no access to model parameters or training data is available. Existing works interpret the predictions of a model hidden behind an API by heuristically probing the response of the API with perturbed input instances. However, these methods do not provide any guarantee on the exactness and consistency of their interpretations. In this paper, we propose an elegant closed form solution named \texttt{OpenAPI} to compute exact and consistent interpretations for the family of Piecewise Linear Models (PLM), which includes many popular classification models. The major idea is to first construct a set of overdetermined linear equation systems with a small set of perturbed instances and the predictions made by the model on those instances. Then, we solve the equation systems to identify the decision features that are responsible for the prediction on an input instance. Our extensive experiments clearly demonstrate the exactness and consistency of our method.

artificial intelligence, machine learning, modeling & simulation, (20 more...)

arXiv.org Machine Learning

1906.06857

Country:

Europe > Italy > Marche > Ancona Province > Ancona (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Exploiting Unsupervised Pre-training and Automated Feature Engineering for Low-resource Hate Speech Detection in Polish

Korzeniowski, Renard, Rolczyński, Rafał, Sadownik, Przemysław, Korbak, Tomasz, Możejko, Marcin

arXiv.org Machine LearningJun-17-2019

This paper presents our contribution to PolEval 2019 Task 6: Hate speech and bullying detection. We describe three parallel approaches that we followed: fine-tuning a pre-trained ULMFiT model to our classification task, fine-tuning a pre-trained BERT model to our classification task, and using the TPOT library to find the optimal pipeline. We present results achieved by these three tools and review their advantages and disadvantages in terms of user experience. Our team placed second in subtask 2 with a shallow model found by TPOT: a~logistic regression classifier with non-trivial feature engineering.

artificial intelligence, language model, machine learning, (13 more...)

arXiv.org Machine Learning

1906.09325

Country:

North America > United States (0.14)
Europe > Portugal (0.14)
Europe > Poland (0.14)

Genre: Research Report > New Finding (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)

Add feedback

The Cells Out of Sample (COOS) dataset and benchmarks for measuring out-of-sample generalization of image classifiers

Lu, Alex X., Lu, Amy X., Schormann, Wiebke, Andrews, David W., Moses, Alan M.

arXiv.org Machine LearningJun-17-2019

Understanding if classifiers generalize to out-of-sample datasets is a central problem in machine learning. Microscopy images provide a standardized way to measure the generalization capacity of image classifiers, as we can image the same classes of objects under increasingly divergent, but controlled factors of variation. We created a public dataset of 132,209 images of mouse cells, COOS-7 (Cells Out Of Sample 7-Class). COOS-7 provides a classification setting where four test datasets have increasing degrees of covariate shift: some images are random subsets of the training data, while others are from experiments reproduced months later and imaged by different instruments. We benchmarked a range of classification models using different representations, including transferred neural network features, end-to-end classification with a supervised deep CNN, and features from a self-supervised CNN. While most classifiers perform well on test datasets similar to the training dataset, all classifiers failed to generalize their performance to datasets with greater covariate shifts. These baselines highlight the challenges of covariate shifts in image data, and establish metrics for improving the generalization capacity of image classifiers.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Machine Learning

1906.07282

Country: North America > Canada > Ontario > Toronto (0.29)

Genre: Research Report (0.86)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.33)

Add feedback

Linear vs Polynomial Regression Walk-Through

#artificialintelligenceJun-16-2019, 23:50:06 GMT

Fish get bigger as they get older. How predictive is fish length (cm) with age (yr) as the explanatory variable? Is the relationship best fit with a linear regression? First, let's bring in the data and a few important modules for the analysis: There are 77 instances in the data set. Now let's visualize the scatter-plot.

artificial intelligence, linear vs polynomial regression walk-through, machine learning, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.39)

Add feedback

Learning When-to-Treat Policies

Nie, Xinkun, Brunskill, Emma, Wager, Stefan

arXiv.org Machine LearningJun-15-2019

Any solution to the "policy learning" problem needs to deal with numerous difficulties, including how to incorporate robustness to potential selection bias as well as fairness constraints articulated by stakeholders, and there have been several notable advances that address these difficulties over the past few years. One limitation of this line of work, however, is that the results cited above all focus on a static setting where a decision-maker only sees each subject once and immediately decides how to treat the subject. In contrast, many problems of applied interest involve a dynamic component whereby the decision-maker makes a series of decisions based on time-varying covariates. In medicine, if a patient has a disease for which all known cures are invasive and have serious side effects, their doctor may choose to monitor disease progression for some time before prescribing one of these invasive treatments. Meanwhile, a health inspector needs to not only choose which restaurants to inspect, but also when to carry out these inspections.

machine learning, policy class, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1905.09751

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (0.66)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Agriculture Commodity Arrival Prediction using Remote Sensing Data: Insights and Beyond

Prasad, Gautam, Vuyyuru, Upendra Reddy, Gupta, Mithun Das

arXiv.org Machine LearningJun-14-2019

In developing countries like India agriculture plays an extremely important role in the lives of the population. In India, around 80\% of the population depend on agriculture or its by-products as the primary means for employment. Given large population dependency on agriculture, it becomes extremely important for the government to estimate market factors in advance and prepare for any deviation from those estimates. Commodity arrivals to market is an extremely important factor which is captured at district level throughout the country. Historical data and short-term prediction of important variables such as arrivals, prices, crop quality etc. for commodities are used by the government to take proactive steps and decide various policy measures. In this paper, we present a framework to work with short timeseries in conjunction with remote sensing data to predict future commodity arrivals. We deal with extremely high dimensional data which exceed the observation sizes by multiple orders of magnitude. We use cascaded layers of dimensionality reduction techniques combined with regularized regression models for prediction. We present results to predict arrivals to major markets and state wide prices for `Tur' (red gram) crop in Karnataka, India. Our model consistently beats popular ML techniques on many instances. Our model is scalable, time efficient and can be generalized to many other crops and regions. We draw multiple insights from the regression parameters, some of which are important aspects to consider when predicting more complex quantities such as prices in the future. We also combine the insights to generate important recommendations for different government organizations.

data mining, machine learning, prediction, (11 more...)

arXiv.org Machine Learning

1906.07573

Country:

North America > United States (0.94)
Asia > India > Karnataka (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Food & Agriculture > Agriculture (1.00)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.73)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback