AITopics

Classic contextual bandit algorithms for linear models, such as LinUCB, assume that the reward distribution for an arm is modeled by a stationary linear regression. When the linear regression model is non-stationary over time, the regret of LinUCB can scale linearly with time. In this paper, we propose a novel multiscale changepoint detection method for the non-stationary linear bandit problems, called Multiscale-LinUCB, which actively adapts to the changing environment. We also provide theoretical analysis of regret bound for Multiscale-LinUCB algorithm. Experimental results show that our proposed Multiscale-LinUCB algorithm outperforms other state-of-the-art algorithms in non-stationary contextual environments.

algorithm, changepoint, stationary period, (15 more...)

2002.05289

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > California > Yolo County > Davis (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.74)

Chen, Xiangning, Hsieh, Cho-Jui

Stabilizing Differentiable Architecture Search via Perturbation-based Regularization

Differentiable architecture search (DARTS) is a prevailing NAS solution to identify architectures. Based on the continuous relaxation of the architecture space, DARTS learns a differentiable architecture weight and largely reduces the search cost. However, its stability and generalizability have been challenged for yielding deteriorating architectures as the search proceeds. We find that the precipitous validation loss landscape, which leads to a dramatic performance drop when distilling the final architecture, is an essential factor that causes instability. Based on this observation, we propose a perturbation-based regularization, named SmoothDARTS (SDARTS), to smooth the loss landscape and improve the generalizability of DARTS. In particular, our new formulations stabilize DARTS by either random smoothing or adversarial attack. The search trajectory on NAS-Bench-1Shot1 demonstrates the effectiveness of our approach and due to the improved stability, we achieve performance gain across various search spaces on 4 datasets. Furthermore, we mathematically show that SDARTS implicitly regularizes the Hessian norm of the validation loss, which accounts for a smoother loss landscape and improved performance. The code is available at https://github.com/xiangning-chen/SmoothDARTS.

architecture, architecture search, darts, (15 more...)

2002.05283

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.40)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Li, Xiaoyu, Zhuang, Zhenxun, Orabona, Francesco

Exponential Step Sizes for Non-Convex Optimization

Stochastic Gradient Descent (SGD) is a popular tool in large scale optimization of machine learning objective functions. However, the performance is greatly variable, depending on the choice of the step sizes. In this paper, we introduce the exponential step sizes for stochastic optimization of smooth non-convex functions which satisfy the Polyak-\L{}ojasiewicz (PL) condition. We show that, without any information on the level of noise over the stochastic gradients, these step sizes guarantee a convergence rate for the last iterate that automatically interpolates between a linear rate (in the noisy-free case) and a $O(\frac{1}{T})$ rate (in the noisy case), up to poly-logarithmic factors. Moreover, if without the PL condition, the exponential step sizes still guarantee optimal convergence to a critical point, up to logarithmic factors. We also validate our theoretical results with empirical experiments on real-world datasets with deep learning architectures.

grid, pl condition, step size, (14 more...)

2002.05273

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
Europe > Hungary > Budapest > Budapest (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Rizi, Fatemeh Salehi, Schloetterer, Joerg, Granitzer, Michael

Shortest path distance approximation using deep learning techniques

Computing shortest path distances between nodes lies at the heart of many graph algorithms and applications. Traditional exact methods such as breadth-first-search (BFS) do not scale up to contemporary, rapidly evolving today's massive networks. Therefore, it is required to find approximation methods to enable scalable graph processing with a significant speedup. In this paper, we utilize vector embeddings learnt by deep learning techniques to approximate the shortest paths distances in large graphs. We show that a feedforward neural network fed with embeddings can approximate distances with relatively low distortion error. The suggested method is evaluated on the Facebook, BlogCatalog, Youtube and Flickr social networks.

graph, node, proceedings, (15 more...)

doi: 10.1109/ASONAM.2018.8508763

2002.05257

Country:

Europe > Germany (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Volodin, Sergei, Wichers, Nevan, Nixon, Jeremy

Resolving Spurious Correlations in Causal Models of Environments via Interventions

Causal models could increase interpretability, robustness to distributional shift and sample efficiency of RL agents. In this vein, we address the question of learning a causal model of an RL environment. This problem is known to be difficult due to spurious correlations. We overcome this difficulty by rewarding an RL agent for designing and executing interventions to discover the true model. We compare rewarding the agent for disproving uncertain edges in the causal graph, rewarding the agent for activating a certain node, or rewarding the agent for increasing the causal graph loss. We show that our methods result in a better causal graph than one generated by following the random policy, or a policy trained on the environment's reward. We find that rewarding for the causal graph loss works the best.

causal graph, graph, intervention, (12 more...)

2002.05217

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Estimating Uncertainty Intervals from Collaborating Networks

Zhou, Tianhui, Li, Yitong, Wu, Yuan, Carlson, David

Effective decision making requires understanding the uncertainty inherent in a prediction. To estimate uncertainty in regression, one could modify a deep neural network to predict coverage intervals, such as by predicting the mean and standard deviation. Unfortunately, in our empirical evaluations the predicted coverage from existing approaches is either overconfident or lacks sharpness (gives imprecise intervals). To address this challenge, we propose a novel method to estimate uncertainty based on two distinct neural networks with two distinct loss functions in a similar vein to Generative Adversarial Networks. Specifically, one network tries to learn the cumulative distribution function, and the second network tries to learn its inverse. Theoretical analysis demonstrates that the idealized solution is a fixed point and that under certain conditions the approach is asymptotically consistent to ground truth. We benchmark the approach on one synthetic and five real-world datasets, including forecasting A1c values in diabetic patients from electronic health records, where uncertainty is critical. In synthetic data, the proposed approach essentially matches the theoretically optimal solution in all aspects. In the real datasets, the proposed approach is empirically more faithful in its coverage estimates and typically gives sharper intervals than competing methods.

conditional distribution, neural network, uncertainty interval, (15 more...)

2002.05212

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > North Carolina > Durham County > Durham (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Health Care Providers & Services (0.92)
Government > Regional Government > North America Government > United States Government (0.92)
Health & Medicine > Health Care Technology > Medical Record (0.68)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Berlage, Oberon, Lux, Klaus-Michael, Graus, David

Improving automated segmentation of radio shows with audio embeddings

Audio features have been proven useful for increasing the performance of automated topic segmentation systems. This study explores the novel task of using audio embeddings for automated, topically coherent segmentation of radio shows. We created three different audio embedding generators using multi-class classification tasks on three datasets from different domains. We evaluate topic segmentation performance of the audio embeddings and compare it against a text-only baseline. We find that a set-up including audio embeddings generated through a non-speech sound event classification task significantly outperforms our text-only baseline by 32.3% in F1-measure. In addition, we find that different classification tasks yield audio embeddings that vary in segmentation performance.

classification task, generator, segmentation, (17 more...)

2002.05194

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry: Media > Radio (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Betzler, Philip, Krippendorf, Sven

Connecting Dualities and Machine Learning

Dualities are widely used in quantum field theories and string theory to obtain correlation functions at high accuracy. Here we present examples where dual data representations are useful in supervised classification, linking machine learning and typical tasks in theoretical physics. We then discuss how such beneficial representations can be enforced in the latent dimension of neural networks. We find that additional contributions to the loss based on feature separation, feature matching with respect to desired representations, and a good performance on a `simple' correlation function can lead to known and unknown dual representations. This is the first proof of concept that computers can find dualities. We discuss how our examples, based on discrete Fourier transformation and Ising models, connect to other dualities in theoretical physics, for instance Seiberg duality.

dual representation, representation, transformation, (14 more...)

2002.05169

Country:

Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Greece > Ionian Islands > Corfu (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Hajij, Mustafa, Munch, Elizabeth, Rosen, Paul

Graph Similarity Using PageRank and Persistent Homology

The PageRank of a graph is a scalar function defined on the node set of the graph which encodes nodes centrality information of the graph. In this work, we utilize the PageRank function on the lower-star filtration of the graph as input to persistent homology to study the problem of graph similarity. By representing each graph as a persistence diagram, we can then compare outputs using the bottleneck distance. We show the effectiveness of our method by utilizing it on two shape mesh datasets.

filtration, graph, homology, (10 more...)

2002.05158

Country:

North America > United States > Michigan (0.05)
North America > United States > Florida (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.97)

Boluki, Shahin, Ardywibowo, Randy, Dadaneh, Siamak Zamani, Zhou, Mingyuan, Qian, Xiaoning

Learnable Bernoulli Dropout for Bayesian Deep Learning

In this work, we propose learnable Bernoulli dropout (LBD), a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters. By probabilistic modeling of Bernoulli dropout, our method enables more robust prediction and uncertainty quantification in deep models. Especially, when combined with variational auto-encoders (VAEs), LBD enables flexible semi-implicit posterior representations, leading to new semi-implicit VAE~(SIVAE) models. We solve the optimization for training with respect to the dropout parameters using Augment-REINFORCE-Merge (ARM), an unbiased and low-variance gradient estimator. Our experiments on a range of tasks show the superior performance of our approach compared with other commonly used dropout schemes. Overall, LBD leads to improved accuracy and uncertainty estimates in image classification and semantic segmentation. Moreover, using SIVAE, we can achieve state-of-the-art performance on collaborative filtering for implicit feedback on several public datasets.

dropout, dropout rate, neural network, (11 more...)

2002.05155

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)