Goto

Collaborating Authors

 Regression


High SNR Consistent Compressive Sensing Without Signal and Noise Statistics

arXiv.org Machine Learning

Recovering the support of sparse vectors in underdetermined linear regression models, \textit{aka}, compressive sensing is important in many signal processing applications. High SNR consistency (HSC), i.e., the ability of a support recovery technique to correctly identify the support with increasing signal to noise ratio (SNR) is an increasingly popular criterion to qualify the high SNR optimality of support recovery techniques. The HSC results available in literature for support recovery techniques applicable to underdetermined linear regression models like least absolute shrinkage and selection operator (LASSO), orthogonal matching pursuit (OMP) etc. assume \textit{a priori} knowledge of noise variance or signal sparsity. However, both these parameters are unavailable in most practical applications. Further, it is extremely difficult to estimate noise variance or signal sparsity in underdetermined regression models. This limits the utility of existing HSC results. In this article, we propose two techniques, \textit{viz.}, residual ratio minimization (RRM) and residual ratio thresholding with adaptation (RRTA) to operate OMP algorithm without the \textit{a priroi} knowledge of noise variance and signal sparsity and establish their HSC analytically and numerically. To the best of our knowledge, these are the first and only noise statistics oblivious algorithms to report HSC in underdetermined regression models.


When do Words Matter? Understanding the Impact of Lexical Choice on Audience Perception using Individual Treatment Effect Estimation

arXiv.org Machine Learning

Studies across many disciplines have shown that lexical choice can affect audience perception. For example, how users describe themselves in a social media profile can affect their perceived socio-economic status. However, we lack general methods for estimating the causal effect of lexical choice on the perception of a specific sentence. While randomized controlled trials may provide good estimates, they do not scale to the potentially millions of comparisons necessary to consider all lexical choices. Instead, in this paper, we first offer two classes of methods to estimate the effect on perception of changing one word to another in a given sentence. The first class of algorithms builds upon quasi-experimental designs to estimate individual treatment effects from observational data. The second class treats treatment effect estimation as a classification problem. We conduct experiments with three data sources (Yelp, Twitter, and Airbnb), finding that the algorithmic estimates align well with those produced by randomized-control trials. Additionally, we find that it is possible to transfer treatment effect classifiers across domains and still maintain high accuracy.


Top 10 Most Popular AI Models - DZone AI

#artificialintelligence

While Artificial Intelligence and Machine Learning provide ample possibilities for businesses to improve their operations and maximize their revenues, there is no such thing as a "free lunch." The "no free lunch" problem is the AI/ML industry adaptation of the age-old "no one-size-fits-all" problem. The array of problems the businesses face is huge, and the variety of ML models used to solve these problems is quite wide, as some algorithms are better at dealing with certain types of problems than the others. We will explain the basic features and areas of application for all these algorithms below. However, we have to explain the basic principle of Machine Learning beforehand.


Adversarial Learning and Explainability in Structured Datasets

arXiv.org Machine Learning

We theoretically and empirically explore the explainability benefits of adversarial learning in logistic regression models on structured datasets. In particular we focus on improved explainability due to significantly higher $\textit{feature-concentration}$ in adversarially-learned models: Compared to natural training, adversarial training tends to more efficiently shrink the weights of non-predictive and weakly-predictive features, while model performance on natural test data only degrades slightly (and even sometimes improves), compared to that of a naturally trained model. We provide theoretical insight into this phenomenon via an analysis of the expectation of the logistic model weight updates by an SGD-based adversarial learning algorithm, where examples are drawn from a random binary data-generation process. We empirically demonstrate the feature-pruning effect on a synthetic dataset, some datasets from the UCI repository, and real-world large-scale advertising response-prediction data-sets from MediaMath. In several of the MediaMath datasets there are 10s of millions of data points, and on the order of 100,000 sparse categorical features, and adversarial learning often results in model-size reduction by a factor of 20 or higher, and yet the model performance on natural test data (measured by AUC) is comparable to (and sometimes even better than) that of the naturally trained model. We also show that traditional $\ell_1$ regularization does not even come close to achieving this level of feature-concentration. We measure "feature concentration" using the Integrated Gradients-based feature-attribution method of Sundararajan et. al (2017), and derive a new closed-form expression for 1-layer networks, which substantially speeds up computation of aggregate feature attributions across a large dataset.


Predicting the Stock Market Using Machine Learning and Deep Learning

#artificialintelligence

There is not a huge difference in the RMSE value, but a plot for the predicted and actual values should provide a more clear understanding. The RMSE value is almost similar to the linear regression model and the plot shows the same pattern. Like linear regression, kNN also identified a drop in January 2018 since that has been the pattern for the past years. We can safely say that regression algorithms have not performed well on this dataset. Let's go ahead and look at some time series forecasting techniques to find out how they perform when faced with this stock prices prediction challenge. ARIMA is a very popular statistical method for time series forecasting. ARIMA models take into account the past values to predict the future values.


Temporal Graph Convolutional Network for Urban Traffic Flow Prediction Method

arXiv.org Machine Learning

Accurate and real-time traffic forecasting plays an important role in the Intelligent Traffic System (ITS), it is of great significance for urban traffic planning, traffic management, and traffic control. However, traffic forecasting has always been a concerned open scientific issue, owing to the constraint of urban road network topological structure and the law of dynamic change with time, namely spatial dependence and temporal dependence. In order to capture the spatial and temporal dependence simultaneously, we propose a novel neural network-based traffic forecasting method, temporal graph convolutional network (T-GCN) model, which is in combination with the graph convolutional network (GCN) and gated recurrent unit (GRU). Specifically, the graph convolutional network is used to learn the complex topological structure to capture the spatial dependence and the gated recurrent unit is used to learn the dynamic change of traffic flow to capture the temporal dependence. And then, the T-GCN model is employed to realize the traffic forecasting task based on urban road network. Experiments demonstrate that our T-GCN model can obtain the spatio temporal correlation from traffic data and the prediction effects outperform state-of-art baselines on real-world traffic datasets.


Adapting multi-armed bandits policies to contextual bandits scenarios

arXiv.org Machine Learning

This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles. Some of these adaptations are achieved through bootstrapping or approximate bootstrapping, while others rely on other forms of randomness, resulting in more scalable approaches than previous works, and the ability to work with any type of classification algorithm. In particular, the Adaptive-Greedy algorithm shows a lot of promise, in many cases achieving better performance than upper confidence bound and Thompson sampling strategies, at the expense of more hyperparameters to tune.


Kernel Regression for Graph Signal Prediction in Presence of Sparse Noise

arXiv.org Machine Learning

In presence of sparse noise we propose kernel regression for predicting output vectors which are smooth over a given graph. Sparse noise models the training outputs being corrupted either with missing samples or large perturbations. The presence of sparse noise is handled using appropriate use of $\ell_1$-norm along-with use of $\ell_2$-norm in a convex cost function. For optimization of the cost function, we propose an iteratively reweighted least-squares (IRLS) approach that is suitable for kernel substitution or kernel trick due to availability of a closed form solution. Simulations using real-world temperature data show efficacy of our proposed method, mainly for limited-size training datasets.


Stacked Penalized Logistic Regression for Selecting Views in Multi-View Learning

arXiv.org Machine Learning

In multi-view learning, features are organized into multiple sets called views. Multi-view stacking (MVS) is an ensemble learning framework which learns a prediction function from each view separately, and then learns a meta-function which optimally combines the view-specific predictions. In case studies, MVS has been shown to increase prediction accuracy. However, the framework can also be used for selecting a subset of important views. We propose a method for selecting views based on MVS, which we call stacked penalized logistic regression (StaPLR). Compared to existing view-selection methods like the group lasso, StaPLR can make use of faster optimization algorithms and is easily parallelized. We show that nonnegativity constraints on the parameters of the function which combines the views are important for preventing unimportant views from entering the model. We investigate the view selection and classification performance of StaPLR and the group lasso through simulations, and consider two real data examples. We observe that StaPLR is less likely to select irrelevant views, leading to models that are sparser at the view level, but which have comparable or increased predictive performance.


Robust Text Classification under Confounding Shift

Journal of Artificial Intelligence Research

As statistical classifiers become integrated into real-world applications, it is important to consider not only their accuracy but also their robustness to changes in the data distribution. Although identifying and controlling for confounding variables Z - correlated with both the input X of a classifier and its output Y - has been assiduously studied in empirical social science, it is often neglected in text classification. This can be understood by the fact that, if we assume that the impact of confounding variables does not change between the time we fit a model and the time we use it, then prediction accuracy should only be slightly affected. We show in this paper that this assumption often does not hold and that when the influence of a confounding variable changes from training time to prediction time (i.e. under confounding shift), the classifier accuracy can degrade rapidly. We use Pearl's back-door adjustment as a predictive framework to develop a model robust to confounding shift under the condition that Z is observed at training time. Our approach does not make any causal conclusions but by experimenting on 6 datasets, we show that our approach is able to outperform baselines 1) in controlled cases where confounding shift is manually injected between fitting time and prediction time 2) in natural experiments where confounding shift appears either abruptly or gradually 3) in cases where there is one or multiple confounders. Finally, we discuss multiple issues we encountered during this research such as the effect of noise in the observation of Z and the importance of only controlling for confounding variables.