Regression
Convergence Rates of Empirical Bayes Posterior Distributions: A Variational Perspective
We study the convergence rates of empirical Bayes posterior distributions for nonparametric and high-dimensional inference. We show that as long as the hyperparameter set is discrete, the empirical Bayes posterior distribution induced by the maximum marginal likelihood estimator can be regarded as a variational approximation to a hierarchical Bayes posterior distribution. This connection between empirical Bayes and variational Bayes allows us to leverage the recent results in the variational Bayes literature, and directly obtains the convergence rates of empirical Bayes posterior distributions from a variational perspective. For a more general hyperparameter set that is not necessarily discrete, we introduce a new technique called "prior decomposition" to deal with prior distributions that can be written as convex combinations of probability measures whose supports are low-dimensional subspaces. This leads to generalized versions of the classical "prior mass and testing" conditions for the convergence rates of empirical Bayes. Our theory is applied to a number of statistical estimation problems including nonparametric density estimation and sparse linear regression.
Health-behaviors associated with the growing risk of adolescent suicide attempts: A data-driven cross-sectional study
Wei, Zhiyuan, Mukherjee, Sayanti
Purpose: Identify and examine the associations between health behaviors and increased risk of adolescent suicide attempts, while controlling for socioeconomic and demographic differences. Design: A data-driven analysis using cross-sectional data. Setting: Communities in the state of Montana from 1999 to 2017. Subjects: Selected 22,447 adolescents of whom 1,631 adolescents attempted suicide at least once. Measures: Overall 29 variables (predictors) accounting for psychological behaviors, illegal substances consumption, daily activities at schools and demographic backgrounds, were considered. Analysis: A library of machine learning algorithms along with the traditionally-used logistic regression were used to model and predict suicide attempt risk. Model performances (goodness-of-fit and predictive accuracy) were measured using accuracy, precision, recall and F-score metrics. Results: The non-parametric Bayesian tree ensemble model outperformed all other models, with 80.0% accuracy in goodness-of-fit (F-score:0.802) and 78.2% in predictive accuracy (F-score:0.785). Key health-behaviors identified include: being sad/hopeless, followed by safety concerns at school, physical fighting, inhalant usage, illegal drugs consumption at school, current cigarette usage, and having first sex at an early age (below 15 years of age). Additionally, the minority groups (American Indian/Alaska Natives, Hispanics/Latinos), and females are also found to be highly vulnerable to attempting suicides. Conclusion: Significant contribution of this work is understanding the key health-behaviors and health disparities that lead to higher frequency of suicide attempts among adolescents, while accounting for the non-linearity and complex interactions among the outcome and the exposure variables.
Nowcasting in a Pandemic using Non-Parametric Mixed Frequency VARs
Huber, Florian, Koop, Gary, Onorante, Luca, Pfarrhofer, Michael, Schreiner, Josef
This paper develops Bayesian econometric methods for posterior and predictive inference in a non-parametric mixed frequency VAR using additive regression trees. We argue that regression tree models are ideally suited for macroeconomic nowcasting in the face of the extreme observations produced by the pandemic due to their flexibility and ability to model outliers. In a nowcasting application involving four major countries in the European Union, we find substantial improvements in nowcasting performance relative to a linear mixed frequency VAR. A detailed examination of the predictive densities in the first six months of 2020 shows where these improvements are achieved.
Machine Learning for Data Analysis
Over the course of an hour, an unsolicited email skips your inbox and goes straight to spam, a car next to you auto-stops when a pedestrian runs in front of it, and an ad for the product you were thinking about yesterday pops up on your social media feed. What do these events all have in common? It's artificial intelligence that has guided all these decisions. And the force behind them all is machine-learning algorithms that use data to predict outcomes. Now, before we look at how machine learning aids data analysis, let's explore the fundamentals of each.
A Novel Training Protocol for Performance Predictors of Evolutionary Neural Architecture Search Algorithms
Sun, Yanan, Sun, Xian, Fang, Yuhan, Yen, Gary
Evolutionary Neural Architecture Search (ENAS) can automatically design the architectures of Deep Neural Networks (DNNs) using evolutionary computation algorithms. However, most ENAS algorithms require intensive computational resource, which is not necessarily available to the users interested. Performance predictors are a type of regression models which can assist to accomplish the search, while without exerting much computational resource. Despite various performance predictors have been designed, they employ the same training protocol to build the regression models: 1) sampling a set of DNNs with performance as the training dataset, 2) training the model with the mean square error criterion, and 3) predicting the performance of DNNs newly generated during the ENAS. In this paper, we point out that the three steps constituting the training protocol are not well though-out through intuitive and illustrative examples. Furthermore, we propose a new training protocol to address these issues, consisting of designing a pairwise ranking indicator to construct the training target, proposing to use the logistic regression to fit the training samples, and developing a differential method to building the training instances. To verify the effectiveness of the proposed training protocol, four widely used regression models in the field of machine learning have been chosen to perform the comparisons on two benchmark datasets. The experimental results of all the comparisons demonstrate that the proposed training protocol can significantly improve the performance prediction accuracy against the traditional training protocols.
Iterative Correction of Sensor Degradation and a Bayesian Multi-Sensor Data Fusion Method
Kolar, Luka, ล ikonja, Rok, Treven, Lenart
We present a novel method for inferring ground-truth signal from multiple degraded signals, affected by different amounts of sensor "exposure". The algorithm learns a multiplicative degradation effect by performing iterative corrections of two signals solely from the ratio between them. The degradation function d should be continuous, satisfy monotonicity, and d(0) 1. We use smoothed monotonic regression method, where we easily incorporate the aforementioned criteria to the fitting part. We include theoretical analysis and prove convergence to the ground-truth signal for the noiseless measurement model. Lastly, we present an approach to fuse the noisy corrected signals using Gaussian processes. We use sparse Gaussian processes that can be utilized for a large number of measurements together with a specialized kernel that enables the estimation of noise values of all sensors. The data fusion framework naturally handles data gaps and provides a simple and powerful method for observing the signal trends on multiple timescales (long-term and short-term signal properties). The viability of correction method is evaluated on a synthetic dataset with known ground-truth signal.
Parallel Extraction of Long-term Trends and Short-term Fluctuation Framework for Multivariate Time Series Forecasting
Xu, Haoyan, Duan, Ziheng, Huang, Yida, Feng, Jie, Ren, Anni, Zhang, Qianru, Song, Pengyu, Wang, Xiaoqian
Multivariate time series forecasting is widely used in various fields. Reasonable prediction results can assist people in planning and decision-making, generate benefits and avoid risks. Normally, there are two characteristics of time series, that is, long-term trend and short-term fluctuation. For example, stock prices will have a long-term upward trend with the market, but there may be a small decline in the short term. These two characteristics are often relatively independent of each other. However, the existing prediction methods often do not distinguish between them, which reduces the accuracy of the prediction model. In this paper, a MTS forecasting framework that can capture the long-term trends and short-term fluctuations of time series in parallel is proposed. This method uses the original time series and its first difference to characterize long-term trends and short-term fluctuations. Three prediction sub-networks are constructed to predict long-term trends, short-term fluctuations and the final value to be predicted. In the overall optimization goal, the idea of multi-task learning is used for reference, which is to make the prediction results of long-term trends and short-term fluctuations as close to the real values as possible while requiring to approximate the values to be predicted. In this way, the proposed method uses more supervision information and can more accurately capture the changing trend of the time series, thereby improving the forecasting performance.
Intro to Machine learning
It has been long understood that learning is a key element of intelligence. This holds both for natural intelligence - we all get smarter by learning and artificial intelligence. The roots of machine learning are in statistics, which can also be thought of as the art of extracting knowledge from data. Especially methods such as linear regression and Bayesian statistics, which are both already more than two centuries old (!), are even today at the heart of machine learning. For more examples and a brief history, see the timeline of machine learning (Wikipedia). Examples include predicting the number of people who will click a Google ad based on the ad content and data about the user's prior online behavior, predicting the number of traffic accidents based on road conditions and speed limit, or predicting the selling price of real estate based on its location, size, and condition.
Machine Learning Regression Masterclass in Python
Artificial Intelligence (AI) revolution is here! The technology is progressing at a massive scale and is being widely adopted in the Healthcare, defense, banking, gaming, transportation and robotics industries. Machine Learning is a subfield of Artificial Intelligence that enables machines to improve at a given task with experience. Machine Learning is an extremely hot topic; the demand for experienced machine learning engineers and data scientists has been steadily growing in the past 5 years. According to a report released by Research and Markets, the global AI and machine learning technology sectors are expected to grow from $1.4B to $8.8B by 2022 and it is predicted that AI tech sector will create around 2.3 million jobs by 2020.
Data Analytics Learning Path - Gift Course
This online tutorial teaches you complete MS Excel from the scratch covering all the essential topics such as Pivots, Macros and Analytics. Learning SQL for Data Analytics is now easy with this online tutorial. Enroll today to master SQL from the beginning by learning SQL commands and tools. Get started with this tutorial to master ML basics Machine Learning Basics: Classification models in Python Course. Get an insights into Machine Learning classification models using Python with this online tutorial.