Regression
KNN Ensembles for Tweedie Regression: The Power of Multiscale Neighborhoods
Very few K-nearest-neighbor (KNN) ensembles exist, despite the efficacy of this approach in regression, classification, and outlier detection. Those that do exist focus on bagging features, rather than varying k or bagging observations; it is unknown whether varying k or bagging observations can improve prediction. Given recent studies from topological data analysis, varying k may function like multiscale topological methods, providing stability and better prediction, as well as increased ensemble diversity. This paper explores 7 KNN ensemble algorithms combining bagged features, bagged observations, and varied k to understand how each of these contribute to model fit. Specifically, these algorithms are tested on Tweedie regression problems through simulations and 6 real datasets; results are compared to state-of-the-art machine learning models including extreme learning machines, random forest, boosted regression, and Morse-Smale regression. Results on simulations suggest gains from varying k above and beyond bagging features or samples, as well as the robustness of KNN ensembles to the curse of dimensionality. KNN regression ensembles perform favorably against state-of-the-art algorithms and dramatically improve performance over KNN regression. Further, real dataset results suggest varying k is a good strategy in general (particularly for difficult Tweedie regression problems) and that KNN regression ensembles often outperform state-of-the-art methods. These results for k-varying ensembles echo recent theoretical results in topological data analysis, where multidimensional filter functions and multiscale coverings provide stability and performance gains over single-dimensional filters and single-scale covering. This opens up the possibility of leveraging multiscale neighborhoods and multiple measures of local geometry in ensemble methods.
Machine Learning Algorithms - Giuseppe Bonaccorso
My latest machine learning book has been published and will be available during the last week of July. In this book you will learn all the important Machine Learning algorithms that are commonly used in the field of data science. These algorithms can be used for supervised as well as unsupervised learning, reinforcement learning, and semi-supervised learning. A few famous algorithms that are covered in this book are Linear regression, Logistic Regression, SVM, Naรฏve Bayes, K-Means, Random Forest, and Feature engineering. In this book you will also learn how these algorithms work and their practical implementation to resolve your problems.
10 Algorithms Machine Learning Engineers Need To Know About 7wData
With the fast mechanization brought about by the technological revolution, the word manual is slowly getting lost amidst the crowd and will very soon completely vanish. As Big data has whisked the tech industry, Machine Learning is gaining importance and has robustly handled huge amount of data making accurate predictions. In an era of constant progress, we can only guess what astounding invention and discovery is to come next. The data-crunching machines that have been seamlessly executing the advanced techniques. Machine Learning is a subset of the Artificial Intelligence, which is a broader term and concept. Where Artificial Intelligence aims to make computers smarter and intelligent, Machine Learning has come up with ways to do that.
Signal and Noise Statistics Oblivious Sparse Reconstruction using OMP/OLS
Kallummil, Sreejith, Kalyani, Sheetal
Orthogonal matching pursuit (OMP) and orthogonal least squares (OLS) are widely used for sparse signal reconstruction in under-determined linear regression problems. The performance of these compressed sensing (CS) algorithms depends crucially on the \textit{a priori} knowledge of either the sparsity of the signal ($k_0$) or noise variance ($\sigma^2$). Both $k_0$ and $\sigma^2$ are unknown in general and extremely difficult to estimate in under determined models. This limits the application of OMP and OLS in many practical situations. In this article, we develop two computationally efficient frameworks namely TF-IGP and RRT-IGP for using OMP and OLS even when $k_0$ and $\sigma^2$ are unavailable. Both TF-IGP and RRT-IGP are analytically shown to accomplish successful sparse recovery under the same set of restricted isometry conditions on the design matrix required for OMP/OLS with \textit{a priori} knowledge of $k_0$ and $\sigma^2$. Numerical simulations also indicate a highly competitive performance of TF-IGP and RRT-IGP in comparison to OMP/OLS with \textit{a priori} knowledge of $k_0$ and $\sigma^2$.
An Interactive Greedy Approach to Group Sparsity in High Dimension
Qian, Wei, Li, Wending, Sogawa, Yasuhiro, Fujimaki, Ryohei, Yang, Xitong, Liu, Ji
Sparsity learning with known grouping structures has received considerable attention due to wide modern applications in high-dimensional data analysis. Although advantages of using group information have been well-studied by shrinkage-based approaches, benefits of group sparsity have not been well-documented for greedy-type methods, which much limits our understanding and use of this important class of methods. In this paper, generalizing from a popular forward-backward greedy approach, we propose a new interactive greedy algorithm for group sparsity learning and prove that the proposed greedy-type algorithm attains the desired benefits of group sparsity under high dimensional settings. An estimation error bound refining other existing methods and a guarantee for group support recovery are also established simultaneously. In addition, an interactive feature is incorporated to allow extra algorithm flexibility without compromise in theoretical properties. The promising use of our proposal is demonstrated through numerical evaluations including a real industrial application in human activity recognition.
Linear Convergence of SVRG in Statistical Estimation
In this paper we establish fast convergence rate of stochastic variance reduction gradient (SVRG) for a class of problems motivated by applications in high dimensional statistics where the problems are not strongly convex, or even non-convex. High-dimensional statistics has achieved remarkable success in the last decade, including results on consistency and rates for various estimator under non-asymptotic high-dimensional scaling, especially when the problem dimensionp is larger than the number of datan [e.g., Negahban et al., 2009, Cand es and Recht, 2009, and many others [Candes et al., 2006, Wainwright, 2006, Chen et al., 2011]] . It is now well known that while this setup appears ill-posed, the estimation or recovery is indeed possible by exploiting the underlying structure of the parameter space - notable examples include sparse vectors, low-rank matrices, and structured regression functions, among others. Recently, estimators leading to non-convex optimizations have gained fast growing attention. Not only it typically has better statistical properties in the high dimensional regime, but also in contrast to common belief, under many cases there exist efficient algorithms that provably find near-optimal solutions Loh and Wainwright [2011], Zhang and Zhang [2012], Loh and Wainwright [2013] . Computation challenges of statistical estimators and machine learning algorithms have been an active area of study, thanks to countless applications involving big data - datasets where both p and n are large.
Intro -- Starting AI w/ fast.ai โ Wayne Nixalo โ Medium
I found www.fast.ai in April 2017 and was a bit blown away. An AI course focused on actually getting things done? I was just finishing Yaser Abu-Mostafa's CS1156x'Learning from Data' on edX, and while a great theoretical course, it did cut down a lot of my enthusiasm for Machine Learning. I guess learning to code in Python while writing Linear Regression models by hand has that effect. What really got me about Jeremy Howard's'Practical Deep Learning I' (which I'll call FAI01/FADL1) was that, over and over again, he'd explain a thing, you'd go do it, and all of a sudden you're catapulted to the forefront of applied ML.
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities
One of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. Prior works in this domain operate on a static snapshot of the community, making strong assumptions about the structure of the data (e.g., relational tables), or consider only shallow features for text classification. To address the above limitations, we propose probabilistic graphical models that can leverage the joint interplay between multiple factors in online communities --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online content, and the expertise of users and their evolution with user-interpretable explanation. To this end, we devise new models based on Conditional Random Fields for different settings like incorporating partial expert knowledge for semi-supervised learning, and handling discrete labels as well as numeric ratings for fine-grained analysis. This enables applications such as extracting reliable side-effects of drugs from user-contributed posts in healthforums, and identifying credible content in news communities. Online communities are dynamic, as users join and leave, adapt to evolving trends, and mature over time. To capture this dynamics, we propose generative models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian Motion to trace the continuous evolution of user expertise and their language model over time. This allows us to identify expert users and credible content jointly over time, improving state-of-the-art recommender systems by explicitly considering the maturity of users. This also enables applications such as identifying helpful product reviews, and detecting fake and anomalous reviews with limited information.
Neighborhood Features Help Detecting Non-Technical Losses in Big Data Sets
Glauner, Patrick, Meira, Jorge, Dolberg, Lautaro, State, Radu, Bettinger, Franck, Rangoni, Yves, Duarte, Diogo
Electricity theft is a major problem around the world in both developed and developing countries and may range up to 40% of the total electricity distributed. More generally, electricity theft belongs to non-technical losses (NTL), which are losses that occur during the distribution of electricity in power grids. In this paper, we build features from the neighborhood of customers. We first split the area in which the customers are located into grids of different sizes. For each grid cell we then compute the proportion of inspected customers and the proportion of NTL found among the inspected customers. We then analyze the distributions of features generated and show why they are useful to predict NTL. In addition, we compute features from the consumption time series of customers. We also use master data features of customers, such as their customer class and voltage of their connection. We compute these features for a Big Data base of 31M meter readings, 700K customers and 400K inspection results. We then use these features to train four machine learning algorithms that are particularly suitable for Big Data sets because of their parallelizable structure: logistic regression, k-nearest neighbors, linear support vector machine and random forest. Using the neighborhood features instead of only analyzing the time series has resulted in appreciable results for Big Data sets for varying NTL proportions of 1%-90%. This work can therefore be deployed to a wide range of different regions around the world.
Error Bounds for Piecewise Smooth and Switching Regression
The paper deals with regression problems, in which the nonsmooth target is assumed to switch between different operating modes. Specifically, piecewise smooth (PWS) regression considers target functions switching deterministically via a partition of the input space, while switching regression considers arbitrary switching laws. The paper derives generalization error bounds in these two settings by following the approach based on Rademacher complexities. For PWS regression, our derivation involves a chaining argument and a decomposition of the covering numbers of PWS classes in terms of the ones of their component functions and the capacity of the classifier partitioning the input space. This yields error bounds with a radical dependency on the number of modes. For switching regression, the decomposition can be performed directly at the level of the Rademacher complexities, which yields bounds with a linear dependency on the number of modes. By using once more chaining and a decomposition at the level of covering numbers, we show how to recover a radical dependency, however at the cost of a slightly worse convergence rate. Examples of applications are given in particular for PWS and swichting regression with linear and kernel-based component functions.