Bayesian inverse regression for supervised dimension reduction with small datasets Machine Learning

We consider supervised dimension reduction problems, namely to identify a low dimensional projection of the predictors $\-x$ which can retain the statistical relationship between $\-x$ and the response variable $y$. We follow the idea of the sliced inverse regression (SIR) class of methods, which is to use the statistical information of the conditional distribution $\pi(\-x|y)$ to identify the dimension reduction (DR) space and in particular we focus on the task of computing this conditional distribution. We propose a Bayesian framework to compute the conditional distribution where the likelihood function is obtained using the Gaussian process regression model. The conditional distribution $\pi(\-x|y)$ can then be obtained directly by assigning weights to the original data points. We then can perform DR by considering certain moment functions (e.g. the first moment) of the samples of the posterior distribution. With numerical examples, we demonstrate that the proposed method is especially effective for small data problems.

Provable Gradient Variance Guarantees for Black-Box Variational Inference Machine Learning

Recent variational inference methods use stochastic gradient estimators whose variance is not well understood. Theoretical guarantees for these estimators are important to understand when these methods will or will not work. This paper gives bounds for the common "reparameterization" estimators when the target is smooth and the variational family is a location-scale distribution. These bounds are unimprovable and thus provide the best possible guarantees under the stated assumptions.

Smooth function approximation by deep neural networks with general activation functions Machine Learning

There has been a growing interest in expressivity of deep neural networks. But most of existing work about this topic focus only on the specific activation function such as ReLU or sigmoid. In this paper, we investigate the approximation ability of deep neural networks with a quite general class of activation functions. This class of activation functions includes most of frequently used activation functions. We derive the required depth, width and sparsity of a deep neural network to approximate any H\"older smooth function upto a given approximation error for the large class of activation functions. Based on our approximation error analysis, we derive the minimax optimality of the deep neural network estimators with the general activation functions in both regression and classification problems.

Linear vs Polynomial Regression Walk-Through


Fish get bigger as they get older. How predictive is fish length (cm) with age (yr) as the explanatory variable? Is the relationship best fit with a linear regression? First, let's bring in the data and a few important modules for the analysis: There are 77 instances in the data set. Now let's visualize the scatter-plot.

Advanced Machine Learning with Basic Excel


In this article, I present a few modern techniques that have been used in various business contexts, comparing performance with traditional methods. The advanced techniques in question are math-free, innovative, efficiently process large amounts of unstructured data, and are robust and scalable. Implementations in Python, R, Julia and Perl are provided, but here we focus on an Excel version that does not even require any Excel macros, coding, plug-ins, or anything other than the most basic version of Excel. It is actually easily implemented in standard, basic SQL too, and we invite readers to work on an SQL version. In short, we offer here an Excel template for machine learning and statistical computing, and it is quite powerful for an Excel spreadsheet.

Artificial Intelligence Made Easy with


If you're anything like my dad, you've worked in IT for decades but have only tangentially touched data science. Now, your new C-something-O wants you to fire up a data analytics team and work with new a set of buzzwords you've only vaguely heard about at conferences. Or perhaps you're a developer at a fast-moving startup and have spent weeks finalizing an algorithm, only to be stymied by issues with deploying the model onto your web application for real time use. For both cases, is definitely a solution worth looking into. positions itself as a software package that streamlines the machine learning process through its open source package H2O and AutoML.

Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks Machine Learning

Single-channel signal separation and deconvolution aims to separate and deconvolve individual sources from a single-channel mixture and is a challenging problem in which no prior knowledge of the mixing filters is available. Both individual sources and mixing filters need to be estimated. In addition, a mixture may contain non-stationary noise which is unseen in the training set. We propose a synthesizing-decomposition (S-D) approach to solve the single-channel separation and deconvolution problem. In synthesizing, a generative model for sources is built using a generative adversarial network (GAN). In decomposition, both mixing filters and sources are optimized to minimize the reconstruction error of the mixture. The proposed S-D approach achieves a peak-to-noise-ratio (PSNR) of 18.9 dB and 15.4 dB in image inpainting and completion, outperforming a baseline convolutional neural network PSNR of 15.3 dB and 12.2 dB, respectively and achieves a PSNR of 13.2 dB in source separation together with deconvolution, outperforming a convolutive non-negative matrix factorization (NMF) baseline of 10.1 dB.

Agriculture Commodity Arrival Prediction using Remote Sensing Data: Insights and Beyond Machine Learning

In developing countries like India agriculture plays an extremely important role in the lives of the population. In India, around 80\% of the population depend on agriculture or its by-products as the primary means for employment. Given large population dependency on agriculture, it becomes extremely important for the government to estimate market factors in advance and prepare for any deviation from those estimates. Commodity arrivals to market is an extremely important factor which is captured at district level throughout the country. Historical data and short-term prediction of important variables such as arrivals, prices, crop quality etc. for commodities are used by the government to take proactive steps and decide various policy measures. In this paper, we present a framework to work with short timeseries in conjunction with remote sensing data to predict future commodity arrivals. We deal with extremely high dimensional data which exceed the observation sizes by multiple orders of magnitude. We use cascaded layers of dimensionality reduction techniques combined with regularized regression models for prediction. We present results to predict arrivals to major markets and state wide prices for `Tur' (red gram) crop in Karnataka, India. Our model consistently beats popular ML techniques on many instances. Our model is scalable, time efficient and can be generalized to many other crops and regions. We draw multiple insights from the regression parameters, some of which are important aspects to consider when predicting more complex quantities such as prices in the future. We also combine the insights to generate important recommendations for different government organizations.

Early Detection of Long Term Evaluation Criteria in Online Controlled Experiments Artificial Intelligence

A common dilemma encountered by many upon implementing an optimization method or experiment, whether it be a reinforcement learning algorithm, or A/B testing, is deciding on what metric to optimize for. Very often short-term metrics, which are easier to measure are chosen over long term metrics which have undesirable time considerations and often a more complex calculation. In this paper, we argue the importance of choosing a metrics that focuses on long term effects. With this comes the necessity in the ability to measure significant differences between groups relatively early. We present here an efficient methodology for early detection of lifetime differences between groups based on bootstrap hypothesis testing of the lifetime forecast of the response. We present an application of this method in the domain of online advertising and we argue that approach not only allows one to focus on the ultimate metric of importance but also provides a means of accelerating the testing period.

Learning to Forget for Meta-Learning Machine Learning

Few-shot learning is a challenging problem where the system is required to achieve generalization from only few examples. Meta-learning tackles the problem by learning prior knowledge shared across a distribution of tasks, which is then used to quickly adapt to unseen tasks. Model-agnostic meta-learning (MAML) algorithm formulates prior knowledge as a common initialization across tasks. However, forcibly sharing an initialization brings about conflicts between tasks and thus compromises the quality of the initialization. In this work, by observing that the extent of compromise differs among tasks and between layers of a neural network, we propose a new initialization idea that employs task-dependent layer-wise attenuation, which we call selective forgetting. The proposed attenuation scheme dynamically controls how much of prior knowledge each layer will exploit for a given task. The experimental results demonstrate that the proposed method mitigates the conflicts and provides outstanding performance as a result. We further show that the proposed method, named L2F, can be applied and improve other state-of-the-art MAML-based frameworks, illustrating its generalizability.