Regression
Amobee at SemEval-2017 Task 4: Deep Learning System for Sentiment Detection on Twitter
Rozental, Alon, Fleischer, Daniel
This paper describes the Amobee sentiment analysis system, adapted to compete in SemEval 2017 task 4. The system consists of two parts: a supervised training of RNN models based on a Twitter sentiment treebank, and the use of feedforward NN, Naive Bayes and logistic regression classifiers to produce predictions for the different sub-tasks. The algorithm reached the 3rd place on the 5-label classification task (sub-task C).
A projection pursuit framework for testing general high-dimensional hypothesis
This article develops a framework for testing general hypothesis in high-dimensional models where the number of variables may far exceed the number of observations. Existing literature has considered less than a handful of hypotheses, such as testing individual coordinates of the model parameter. However, the problem of testing general and complex hypotheses remains widely open. We propose a new inference method developed around the hypothesis adaptive projection pursuit framework, which solves the testing problems in the most general case. The proposed inference is centered around a new class of estimators defined as $l_1$ projection of the initial guess of the unknown onto the space defined by the null. This projection automatically takes into account the structure of the null hypothesis and allows us to study formal inference for a number of long-standing problems. For example, we can directly conduct inference on the sparsity level of the model parameters and the minimum signal strength. This is especially significant given the fact that the former is a fundamental condition underlying most of the theoretical development in high-dimensional statistics, while the latter is a key condition used to establish variable selection properties. Moreover, the proposed method is asymptotically exact and has satisfactory power properties for testing very general functionals of the high-dimensional parameters. The simulation studies lend further support to our theoretical claims and additionally show excellent finite-sample size and power properties of the proposed test.
Top 20 Data Science MOOCs
Introduce yourself to the basics of data science and leave armed with practical experience extracting value from big data. This course teaches the basic techniques of data science, including both SQL and NoSQL solutions for massive data management (e.g., MapReduce and contemporaries), algorithms for data mining (e.g., clustering and association rule mining), and basic statistical modelling (e.g., linear and non-linear regression).
Adaptation and learning over networks for nonlinear system modeling
Scardapane, Simone, Chen, Jie, Richard, Cรฉdric
To be published as a chapter in'Adaptive Learning Methods for Nonlinear System Modeling', Elsevier Publishing, Eds. Abstract In this chapter, we analyze nonlinear filtering problems in distributed environments, e.g., sensor networks or peer-to-peer protocols. In these scenarios, the agents in the environment receive measurements in a streaming fashion, and they are required to estimate a common (nonlinear) model by alternating local computations and communications with their neighbors. We focus on the important distinction between single-task problems, where the underlying model is common to all agents, and multitask problems, where each agent might converge to a different model due to, e.g., spatial dependencies or other factors. Currently, most of the literature on distributed learning in the nonlinear case has focused on the single-task case, which may be a strong limitation in real-world scenarios. After introducing the problem and reviewing the existing approaches, we describe a simple kernel-based algorithm tailored for the multitask case. We evaluate the proposal on a simulated benchmark task, and we conclude by detailing currently open problems and lines of research.
Prediction of Daytime Hypoglycemic Events Using Continuous Glucose Monitoring Data and Classification Technique
Jung, Miyeon, Lee, You-Bin, Jin, Sang-Man, Park, Sung-Min
-- Daytime hypoglycemia should be accurately predicted to achieve normo glycemia and to avoid disastrous situations . Hypoglycemia, an abn ormally low blood glucose level, is divided into daytime hypogly cemia and nocturnal hypoglycemia . In this paper, we propose new predictor variables to predict daytime hypoglycemia using continuous glucose monitoring (CGM) data. We apply classification and regression tree (CART) as a prediction method . The evaluation results showed that our model wa s able to detect almost 80% of hypoglycemic events 15 min in advance, which was higher than the existing methods with similar conditions . T he proposed method might achieve a real - tim e prediction as well as can be e mbedded into BG monitoring device. Diabetes is one of the most common chronic diseases in the world, affecting 2.72 million individuals (10% of the population) in the Korea [1] and 29.1 million individuals (9.3% of the populat ion) in the USA with increasing incidence [2] . Diabetes can be th e cause of kidney failure, lower - limb amputations, and blindness among adults [2] . A chievement of excellent glycemia is most important task to diabetic patients in both type 1 and type 2 diabetes. D iabetic patient s should maintain euglycemic blood glucose (BG) levels while all day and be required the wisdom to avoid hyper - and hyp oglycemia [3] . Especially, the patients who treated w ith an insulin are at risk for developing hypoglycemia. Population - based data indicate that 30 - 40% o f people with type 1 diabetes ex perience an average of three episodes of severe hypoglycemia each year; those with insulin - treated type 2 diabetes experience about one episode of that each year. Also, individuals with type 1 diabetes experienced about 43 symptomatic (not only severe) episodes annually; insulin - treated individuals with type 2 diabetes experienced about 16 episodes annually [4] . The s ymptomatic hypoglycemic e pisode mean s that the patients feel the symptoms of s hakiness, sweating, hunger, irritability or headache [5] . H ypoglycemia is a significant challenge for a precise insulin therapy [6] .
Converting High-Dimensional Regression to High-Dimensional Conditional Density Estimation
There is a growing demand for nonparametric conditional density estimators (CDEs) in fields such as astronomy and economics. In astronomy, for example, one can dramatically improve estimates of the parameters that dictate the evolution of the Universe by working with full conditional densities instead of regression (i.e., conditional mean) estimates. More generally, standard regression falls short in any prediction problem where the distribution of the response is more complex with multi-modality, asymmetry or heteroscedastic noise. Nevertheless, much of the work on high-dimensional inference concerns regression and classification only, whereas research on density estimation has lagged behind. Here we propose FlexCode, a fully nonparametric approach to conditional density estimation that reformulates CDE as a non-parametric orthogonal series problem where the expansion coefficients are estimated by regression. By taking such an approach, one can efficiently estimate conditional densities and not just expectations in high dimensions by drawing upon the success in high-dimensional regression. Depending on the choice of regression procedure, our method can adapt to a variety of challenging high-dimensional settings with different structures in the data (e.g., a large number of irrelevant components and nonlinear manifold structure) as well as different data types (e.g., functional data, mixed data types and sample sets). We study the theoretical and empirical performance of our proposed method, and we compare our approach with traditional conditional density estimators on simulated as well as real-world data, such as photometric galaxy data, Twitter data, and line-of-sight velocities in a galaxy cluster.
SOFAR: large-scale association network learning
Uematsu, Yoshimasa, Fan, Yingying, Chen, Kun, Lv, Jinchi, Lin, Wei
Many modern big data applications feature large scale in both numbers of responses and predictors. Better statistical efficiency and scientific insights can be enabled by understanding the large-scale response-predictor association network structures via layers of sparse latent factors ranked by importance. Yet sparsity and orthogonality have been two largely incompatible goals. To accommodate both features, in this paper we suggest the method of sparse orthogonal factor regression (SOFAR) via the sparse singular value decomposition with orthogonality constrained optimization to learn the underlying association networks, with broad applications to both unsupervised and supervised learning tasks such as biclustering with sparse singular value decomposition, sparse principal component analysis, sparse factor analysis, and spare vector autoregression analysis. Exploiting the framework of convexity-assisted nonconvex optimization, we derive nonasymptotic error bounds for the suggested procedure characterizing the theoretical advantages. The statistical guarantees are powered by an efficient SOFAR algorithm with convergence property. Both computational and theoretical advantages of our procedure are demonstrated with several simulation and real data examples.
Exploiting random projections and sparsity with random forests and gradient boosting methods -- Application to multi-label and multi-output learning, random forest model compression and leveraging input sparsity
Within machine learning, the supervised learning field aims at modeling the input-output relationship of a system, from past observations of its behavior. Decision trees characterize the input-output relationship through a series of nested $if-then-else$ questions, the testing nodes, leading to a set of predictions, the leaf nodes. Several of such trees are often combined together for state-of-the-art performance: random forest ensembles average the predictions of randomized decision trees trained independently in parallel, while tree boosting ensembles train decision trees sequentially to refine the predictions made by the previous ones. The emergence of new applications requires scalable supervised learning algorithms in terms of computational power and memory space with respect to the number of inputs, outputs, and observations without sacrificing accuracy. In this thesis, we identify three main areas where decision tree methods could be improved for which we provide and evaluate original algorithmic solutions: (i) learning over high dimensional output spaces, (ii) learning with large sample datasets and stringent memory constraints at prediction time and (iii) learning over high dimensional sparse input spaces.
A Flexible Framework for Hypothesis Testing in High-dimensions
Javanmard, Adel, Lee, Jason D.
Hypothesis testing in the linear regression model is a fundamental statistical problem. We consider linear regression in the high-dimensional regime where the number of parameters exceeds the number of samples ($p> n$) and assume that the high-dimensional parameters vector is $s_0$ sparse. We develop a general and flexible $\ell_\infty$ projection statistic for hypothesis testing in this model. Our framework encompasses testing whether the parameter lies in a convex cone, testing the signal strength, testing arbitrary functionals of the parameter, and testing adaptive hypothesis. We show that the proposed procedure controls the type I error under the standard assumption of $s_0 (\log p)/\sqrt{n}\to 0$, and also analyze the power of the procedure. Our numerical experiments confirms our theoretical findings and demonstrate that we control false positive rate (type I error) near the nominal level, and have high power.