Goto

Collaborating Authors

 Country


Multiscale Non-stationary Stochastic Bandits

arXiv.org Machine Learning

Classic contextual bandit algorithms for linear models, such as LinUCB, assume that the reward distribution for an arm is modeled by a stationary linear regression. When the linear regression model is non-stationary over time, the regret of LinUCB can scale linearly with time. In this paper, we propose a novel multiscale changepoint detection method for the non-stationary linear bandit problems, called Multiscale-LinUCB, which actively adapts to the changing environment. We also provide theoretical analysis of regret bound for Multiscale-LinUCB algorithm. Experimental results show that our proposed Multiscale-LinUCB algorithm outperforms other state-of-the-art algorithms in non-stationary contextual environments.


Stabilizing Differentiable Architecture Search via Perturbation-based Regularization

arXiv.org Machine Learning

Differentiable architecture search (DARTS) is a prevailing NAS solution to identify architectures. Based on the continuous relaxation of the architecture space, DARTS learns a differentiable architecture weight and largely reduces the search cost. However, its stability and generalizability have been challenged for yielding deteriorating architectures as the search proceeds. We find that the precipitous validation loss landscape, which leads to a dramatic performance drop when distilling the final architecture, is an essential factor that causes instability. Based on this observation, we propose a perturbation-based regularization, named SmoothDARTS (SDARTS), to smooth the loss landscape and improve the generalizability of DARTS. In particular, our new formulations stabilize DARTS by either random smoothing or adversarial attack. The search trajectory on NAS-Bench-1Shot1 demonstrates the effectiveness of our approach and due to the improved stability, we achieve performance gain across various search spaces on 4 datasets. Furthermore, we mathematically show that SDARTS implicitly regularizes the Hessian norm of the validation loss, which accounts for a smoother loss landscape and improved performance. The code is available at https://github.com/xiangning-chen/SmoothDARTS.


Exponential Step Sizes for Non-Convex Optimization

arXiv.org Machine Learning

Stochastic Gradient Descent (SGD) is a popular tool in large scale optimization of machine learning objective functions. However, the performance is greatly variable, depending on the choice of the step sizes. In this paper, we introduce the exponential step sizes for stochastic optimization of smooth non-convex functions which satisfy the Polyak-\L{}ojasiewicz (PL) condition. We show that, without any information on the level of noise over the stochastic gradients, these step sizes guarantee a convergence rate for the last iterate that automatically interpolates between a linear rate (in the noisy-free case) and a $O(\frac{1}{T})$ rate (in the noisy case), up to poly-logarithmic factors. Moreover, if without the PL condition, the exponential step sizes still guarantee optimal convergence to a critical point, up to logarithmic factors. We also validate our theoretical results with empirical experiments on real-world datasets with deep learning architectures.


Shortest path distance approximation using deep learning techniques

arXiv.org Machine Learning

Computing shortest path distances between nodes lies at the heart of many graph algorithms and applications. Traditional exact methods such as breadth-first-search (BFS) do not scale up to contemporary, rapidly evolving today's massive networks. Therefore, it is required to find approximation methods to enable scalable graph processing with a significant speedup. In this paper, we utilize vector embeddings learnt by deep learning techniques to approximate the shortest paths distances in large graphs. We show that a feedforward neural network fed with embeddings can approximate distances with relatively low distortion error. The suggested method is evaluated on the Facebook, BlogCatalog, Youtube and Flickr social networks.


Resolving Spurious Correlations in Causal Models of Environments via Interventions

arXiv.org Machine Learning

Causal models could increase interpretability, robustness to distributional shift and sample efficiency of RL agents. In this vein, we address the question of learning a causal model of an RL environment. This problem is known to be difficult due to spurious correlations. We overcome this difficulty by rewarding an RL agent for designing and executing interventions to discover the true model. We compare rewarding the agent for disproving uncertain edges in the causal graph, rewarding the agent for activating a certain node, or rewarding the agent for increasing the causal graph loss. We show that our methods result in a better causal graph than one generated by following the random policy, or a policy trained on the environment's reward. We find that rewarding for the causal graph loss works the best.


Estimating Uncertainty Intervals from Collaborating Networks

arXiv.org Machine Learning

Effective decision making requires understanding the uncertainty inherent in a prediction. To estimate uncertainty in regression, one could modify a deep neural network to predict coverage intervals, such as by predicting the mean and standard deviation. Unfortunately, in our empirical evaluations the predicted coverage from existing approaches is either overconfident or lacks sharpness (gives imprecise intervals). To address this challenge, we propose a novel method to estimate uncertainty based on two distinct neural networks with two distinct loss functions in a similar vein to Generative Adversarial Networks. Specifically, one network tries to learn the cumulative distribution function, and the second network tries to learn its inverse. Theoretical analysis demonstrates that the idealized solution is a fixed point and that under certain conditions the approach is asymptotically consistent to ground truth. We benchmark the approach on one synthetic and five real-world datasets, including forecasting A1c values in diabetic patients from electronic health records, where uncertainty is critical. In synthetic data, the proposed approach essentially matches the theoretically optimal solution in all aspects. In the real datasets, the proposed approach is empirically more faithful in its coverage estimates and typically gives sharper intervals than competing methods.


Improving automated segmentation of radio shows with audio embeddings

arXiv.org Machine Learning

Audio features have been proven useful for increasing the performance of automated topic segmentation systems. This study explores the novel task of using audio embeddings for automated, topically coherent segmentation of radio shows. We created three different audio embedding generators using multi-class classification tasks on three datasets from different domains. We evaluate topic segmentation performance of the audio embeddings and compare it against a text-only baseline. We find that a set-up including audio embeddings generated through a non-speech sound event classification task significantly outperforms our text-only baseline by 32.3% in F1-measure. In addition, we find that different classification tasks yield audio embeddings that vary in segmentation performance.


Connecting Dualities and Machine Learning

arXiv.org Machine Learning

Dualities are widely used in quantum field theories and string theory to obtain correlation functions at high accuracy. Here we present examples where dual data representations are useful in supervised classification, linking machine learning and typical tasks in theoretical physics. We then discuss how such beneficial representations can be enforced in the latent dimension of neural networks. We find that additional contributions to the loss based on feature separation, feature matching with respect to desired representations, and a good performance on a `simple' correlation function can lead to known and unknown dual representations. This is the first proof of concept that computers can find dualities. We discuss how our examples, based on discrete Fourier transformation and Ising models, connect to other dualities in theoretical physics, for instance Seiberg duality.


Graph Similarity Using PageRank and Persistent Homology

arXiv.org Machine Learning

The PageRank of a graph is a scalar function defined on the node set of the graph which encodes nodes centrality information of the graph. In this work, we utilize the PageRank function on the lower-star filtration of the graph as input to persistent homology to study the problem of graph similarity. By representing each graph as a persistence diagram, we can then compare outputs using the bottleneck distance. We show the effectiveness of our method by utilizing it on two shape mesh datasets.


Learnable Bernoulli Dropout for Bayesian Deep Learning

arXiv.org Machine Learning

In this work, we propose learnable Bernoulli dropout (LBD), a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters. By probabilistic modeling of Bernoulli dropout, our method enables more robust prediction and uncertainty quantification in deep models. Especially, when combined with variational auto-encoders (VAEs), LBD enables flexible semi-implicit posterior representations, leading to new semi-implicit VAE~(SIVAE) models. We solve the optimization for training with respect to the dropout parameters using Augment-REINFORCE-Merge (ARM), an unbiased and low-variance gradient estimator. Our experiments on a range of tasks show the superior performance of our approach compared with other commonly used dropout schemes. Overall, LBD leads to improved accuracy and uncertainty estimates in image classification and semantic segmentation. Moreover, using SIVAE, we can achieve state-of-the-art performance on collaborative filtering for implicit feedback on several public datasets.