AITopics

2001.06027

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Baran, Ágnes, Lerch, Sebastian, Ayari, Mehrez El, Baran, Sándor

Machine learning for total cloud cover prediction

arXiv.org Machine LearningJan-16-2020

Accurate and reliable forecasting of total cloud cover (TCC) is vital for many areas such as astronomy, energy demand and production, or agriculture. Most meteorological centres issue ensemble forecasts of TCC, however, these forecasts are often uncalibrated and exhibit worse forecast skill than ensemble forecasts of other weather variables. Hence, some form of post-processing is strongly required to improve predictive performance. As TCC observations are usually reported on a discrete scale taking just nine different values called oktas, statistical calibration of TCC ensemble forecasts can be considered a classification problem with outputs given by the probabilities of the oktas. This is a classical area where machine learning methods are applied. We investigate the performance of post-processing using multilayer percep-tron (MLP) neural networks, gradient boosting machines (GBM) and random forest (RF) methods. Based on the European Centre for Medium-Range Weather Forecasts global TCC ensemble forecasts for 2002-2014 we compare these approaches with the proportional odds logistic regression (POLR) and multiclass logistic regression (MLR) models, as well as the raw TCC ensemble forecasts. We further assess whether improvements in forecast skill can be obtained by incorporating ensemble forecasts of precipitation as additional predictor. Compared to the raw ensemble, all calibration methods result in a significant improvement in forecast skill. RF models provide the smallest increase in predictive performance, while MLP, POLR and GBM approaches perform best. Key words: ensemble calibration; gradient boosting machine; logistic regression; mul-tilayer perceptron; random forest; total cloud cover 1 Introduction Reliable and accurate prediction of total cloud cover (TCC) has a principal importance in observational astronomy (Ye and Chen, 2013) and in the prediction of photovoltaic energy production, as it is the main cause of variation in solar-radiation energy supply (Matuszko, 2012; McEvoy et al., 2012), but it is also of great relevance in agriculture, tourism and in some other fields of economy.

ensemble forecast, forecast, tcc ensemble forecast, (15 more...)

2001.05948

Country:

Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)
North America > United States > New York (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre: Research Report > Experimental Study (0.75)

Industry: Energy > Renewable > Solar (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.75)

arXiv.org Artificial IntelligenceJan-16-2020

#MeToo on Campus: Studying College Sexual Assault at Scale Using Data Reported on Social Media

Duong, Viet, Pham, Phu, Bose, Ritwik, Luo, Jiebo

Recently, the emergence of the #MeToo trend on social media has empowered thousands of people to share their own sexual harassment experiences. This viral trend, in conjunction with the massive personal information and content available on Twitter, presents a promising opportunity to extract data driven insights to complement the ongoing survey based studies about sexual harassment in college. In this paper, we analyze the influence of the #MeToo trend on a pool of college followers. The results show that the majority of topics embedded in those #MeToo tweets detail sexual harassment stories, and there exists a significant correlation between the prevalence of this trend and official reports on several major geographical regions. Furthermore, we discover the outstanding sentiments of the #MeToo tweets using deep semantic meaning representations and their implications on the affected users experiencing different types of sexual harassment. We hope this study can raise further awareness regarding sexual misconduct in academia.

sexual harassment, tweet, twitter, (10 more...)

arXiv.org Artificial Intelligence

2001.0597

Country:

North America > United States > Utah (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

#artificialintelligenceJan-15-2020, 09:06:04 GMT

Machine Learning & Tensorflow - Google Cloud Approach

Students who have at least high school knowledge in math and who want to start learning Machine Learning. Any intermediate level people who know the basics of machine learning, including the classical algorithms like linear regression or logistic regression, but who want to learn more about it and explore all the different fields of Machine Learning. Any people who are not that comfortable with coding but who are interested in Machine Learning and want to apply it easily on datasets. Anyone willing to learn machine learning on Google cloud platform. Any students in college who want to start a career in Data Science. Any data analysts who want to level up in Machine Learning.

google cloud approach, machine learning, machine learning & tensorflow, (7 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education > Educational Setting > Online (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.41)

#artificialintelligenceJan-14-2020, 17:40:01 GMT

Machine Learning for CEOs

When I worked as a McKinsey consultant, I served the CEO of a bank regarding his small business strategy. I wanted to run regressions on the bank's data but I was advised against it: "They don't even understand statistics. How are you going to explain a regression to them?". CEOs have always needed to deeply understand human intelligence and emotion to manage enterprise teams. Now machines and algorithms are increasingly becoming part of these very teams.

algorithm, machine learning, neural network, (13 more...)

#artificialintelligence

Country: Asia > South Korea (0.05)

Industry:

Health & Medicine (0.51)
Leisure & Entertainment > Games (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.31)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Unsupervised Pool-Based Active Learning for Linear Regression

Liu, Ziang, Wu, Dongrui

In many real-world machine learning applications, unlabeled data can be easily obtained, but it is very time-consuming and/or expensive to label them. So, it is desirable to be able to select the optimal samples to label, so that a good machine learning model can be trained from a minimum amount of labeled data. Active learning (AL) has been widely used for this purpose. However, most existing AL approaches are supervised: they train an initial model from a small amount of labeled samples, query new samples based on the model, and then update the model iteratively. Few of them have considered the completely unsupervised AL problem, i.e., starting from zero, how to optimally select the very first few samples to label, without knowing any label information at all. This problem is very challenging, as no label information can be utilized. This paper studies unsupervised pool-based AL for linear regression problems. We propose a novel AL approach that considers simultaneously the informativeness, representativeness, and diversity, three essential criteria in AL. Extensive experiments on 14 datasets from various application domains, using three different linear regression models (ridge regression, LASSO, and linear support vector regression), demonstrated the effectiveness of our proposed approach.

artificial intelligence, dataset, upstream oil & gas, (19 more...)

2001.05028

Country:

Europe > Hungary (0.14)
Oceania > Australia (0.14)
North America > United States > Wisconsin (0.14)
(3 more...)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Private Machine Learning via Randomised Response

Barber, David

We introduce a general learning framework for private machine learning based on randomised response. Our assumption is that all actors are potentially adversarial and as such we trust only to release a single noisy version of an individual's datapoint. We discuss a general approach that forms a consistent way to estimate the true underlying machine learning model and demonstrate this in the case of logistic regression.

artificial intelligence, datapoint, machine learning, (17 more...)

2001.04942

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Zhang, Yao, Bellot, Alexis, van der Schaar, Mihaela

Learning Overlapping Representations for the Estimation of Individualized Treatment Effects

The choice of making an intervention depends on its potential benefit or harm in comparison to alternatives. Estimating the likely outcome of alternatives from observational data is a challenging problem as all outcomes are never observed, and selection bias precludes the direct comparison of differently intervened groups. Despite their empirical success, we show that algorithms that learn domain-invariant representations of inputs (on which to make predictions) are often inappropriate, and develop generalization bounds that demonstrate the dependence on domain overlap and highlight the need for invertible latent maps. Based on these results, we develop a deep kernel regression algorithm and posterior regularization framework that substantially outperforms the state-of-the-art on a variety of benchmarks data sets.

representation, treatment effect, variance, (12 more...)

2001.04754

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Italy > Sicily > Palermo (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Park, Chiwoo, Borth, David J., Wilson, Nicholas S., Hunter, Chad N., Friedersdorf, Fritz J.

Robust Gaussian Process Regression with a Bias Model

This paper presents a new approach to a robust Gaussian process (GP) regression. Most existing approaches replace an outlier-prone Gaussian likelihood with a non-Gaussian likelihood induced from a heavy tail distribution, such as the Laplace distribution and Student-t distribution. However, the use of a non-Gaussian likelihood would incur the need for a computationally expensive Bayesian approximate computation in the posterior inferences. The proposed approach models an outlier as a noisy and biased observation of an unknown regression function, and accordingly, the likelihood contains bias terms to explain the degree of deviations from the regression function. We entail how the biases can be estimated accurately with other hyperparameters by a regularized maximum likelihood estimation. Conditioned on the bias estimates, the robust GP regression can be reduced to a standard GP regression problem with analytical forms of the predictive mean and variance estimates. Therefore, the proposed approach is simple and very computationally attractive. It also gives a very robust and accurate GP estimate for many tested scenarios. For the numerical evaluation, we perform a comprehensive simulation study to evaluate the proposed approach with the comparison to the existing robust GP approaches under various simulated scenarios of different outlier proportions and different noise levels. The approach is applied to data from two measurement systems, where the predictors are based on robust environmental parameter measurements and the response variables utilize more complex chemical sensing methods that contain a certain percentage of outliers. The utility of the measurement systems and value of the environmental data are improved through the computationally efficient GP regression and bias model.

likelihood, outlier, robust gaussian process regression, (12 more...)

2001.04639

Country:

North America > United States > Ohio > Montgomery County > Dayton (0.04)
North America > United States > Florida > Monroe County > Key West (0.04)
North America > United States > Florida > Leon County > Tallahassee (0.04)

Genre: Research Report (0.64)

Industry: Government > Military (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

#artificialintelligenceJan-13-2020, 14:20:52 GMT

Top AI algorithms for Healthcare

Despite the variety of applications of AI in the clinical studies and healthcare services, they fall into two major categories: analysis of structured data, including images, genes and biomarkers, and analysis of unstructured data, such as notes, medical journals or patients' surveys to complement the structured data. The former approach is fueled by Machine Learning and Deep Learning Algorithms, while the latter rest on the specialized Natural Language Processing practices. ML algorithms chiefly extract features from data, such as patients' "traits" and medical outcomes of interest. For a long time, AI in healthcare was dominated by the logistic regression, the most simple and common algorithm when it is necessary to classify things. It was easy to use, quick to finish and easy to interpret.

algorithm, regression, top ai algorithm, (12 more...)

#artificialintelligence

Genre:

Research Report > New Finding (0.80)
Research Report > Experimental Study (0.80)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.76)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)