Country
SpatialRank Urban Event Ranking with Optimization on Spatiotemporal Data
This problem is of fundamental importance to public safety and urban administration especially when limited resources are available. The problem is, however, challenging due to complex and dynamic spatio-temporal correlations between locations, uneven distribution of urban events in space, and the difficulty to correctly rank nearby locations with similar features. Prior works on event forecasting mostly aim at accurately predicting the actual risk score or counts of events for all the locations. Rankings obtained as such usually have low quality due to prediction errors. Learning-to-rank methods directly optimize measures such as Normalized Discounted Cumulative Gain (NDCG), but cannot handle the spatiotemporal autocorrelation existing among locations. In this paper, we bridge the gap by proposing a novel spatial event ranking approach named SpatialRank. SpatialRank features adaptive graph convolution layers that dynamically learn the spatiotemporal dependencies across locations from data.
Reconciling meta-learning and continual learning with online mixtures of tasks
Learning-to-learn or meta-learning leverages data-driven inductive bias to increase the efficiency of learning on a novel task. This approach encounters difficulty when transfer is not advantageous, for instance, when tasks are considerably dissimilar or change over time. We use the connection between gradient-based meta-learning and hierarchical Bayes to propose a Dirichlet process mixture of hierarchical Bayesian models over the parameters of an arbitrary parametric model such as a neural network. In contrast to consolidating inductive biases into a single set of hyperparameters, our approach of task-dependent hyperparameter selection better handles latent distribution shift, as demonstrated on a set of evolving, image-based, few-shot learning benchmarks.
Optimal Prediction of the Number of Unseen Species with Multiplicity Yi Hao
Based on a sample of size n, we consider estimating the number of symbols that appear at least µ times in an independent sample of size a n, where a is a given parameter. This formulation includes, as a special case, the well-known problem of inferring the number of unseen species introduced by [Fisher et al.] in 1943 and considered by many others. Of considerable interest in this line of works is the largest a for which the quantity can be accurately predicted. We completely resolve this problem by determining the limit of estimation to be a (log n)/µ, with both lower and upper bounds matching up to constant factors. For the particular case of µ = 1, this implies the recent result by [Orlitsky et al.] on the unseen species problem. Experimental evaluations show that the proposed estimator performs exceptionally well in practice. Furthermore, the estimator is a linear combination of symbols' empirical counts, and hence linear-time computable.
Optimal Prediction of the Number of Unseen Species with Multiplicity Yi Hao
Based on a sample of size n, we consider estimating the number of symbols that appear at least µ times in an independent sample of size a n, where a is a given parameter. This formulation includes, as a special case, the well-known problem of inferring the number of unseen species introduced by [Fisher et al.] in 1943 and considered by many others. Of considerable interest in this line of works is the largest a for which the quantity can be accurately predicted. We completely resolve this problem by determining the limit of estimation to be a (log n)/µ, with both lower and upper bounds matching up to constant factors. For the particular case of µ = 1, this implies the recent result by [Orlitsky et al.] on the unseen species problem. Experimental evaluations show that the proposed estimator performs exceptionally well in practice. Furthermore, the estimator is a linear combination of symbols' empirical counts, and hence linear-time computable.
Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation Shaofeng Zou Department of Electrical Engineering Department of Electrical Engineering University at Buffalo
Temporal-difference learning with gradient correction (TDC) is a two time-scale algorithm for policy evaluation in reinforcement learning. This algorithm was initially proposed with linear function approximation, and was later extended to the one with general smooth function approximation. The asymptotic convergence for the on-policy setting with general smooth function approximation was established in [Bhatnagar et al., 2009], however, the non-asymptotic convergence analysis remains unsolved due to challenges in the non-linear and two-time-scale update structure, non-convex objective function and the projection onto a time-varying tangent plane. In this paper, we develop novel techniques to address the above challenges and explicitly characterize the non-asymptotic error bound for the general off-policy setting with i.i.d. or Markovian samples, and show that it converges as fast as O(1/ T) (up to a factor of O(log T)). Our approach can be applied to a wide range of value-based reinforcement learning algorithms with general smooth function approximation.
Efficient Active Learning for Gaussian Process Classification by Error Reduction
Active learning sequentially selects the best instance for labeling by optimizing an acquisition function to enhance data/label efficiency. The selection can be either from a discrete instance set (pool-based scenario) or a continuous instance space (query synthesis scenario). In this work, we study both active learning scenarios for Gaussian Process Classification (GPC). The existing active learning strategies that maximize the Estimated Error Reduction (EER) aim at reducing the classification error after training with the new acquired instance in a onestep-look-ahead manner. The computation of EER-based acquisition functions is typically prohibitive as it requires retraining the GPC with every new query.
Bad Global Minima Exist and SGD Can Reach Them Dimitris Papailiopoulos Quebec Artificial Intelligence Institute (Mila) University of Wisconsin-Madison Université de Montréal
Several works have aimed to explain why overparameterized neural networks generalize well when trained by Stochastic Gradient Descent (SGD). The consensus explanation that has emerged credits the randomized nature of SGD for the bias of the training process towards low-complexity models and, thus, for implicit regularization. We take a careful look at this explanation in the context of image classification with common deep neural network architectures. We find that if we do not regularize explicitly, then SGD can be easily made to converge to poorlygeneralizing, high-complexity models: all it takes is to first train on a random labeling on the data, before switching to properly training with the correct labels. In contrast, we find that in the presence of explicit regularization, pretraining with random labels has no detrimental effect on SGD. We believe that our results give evidence that explicit regularization plays a far more important role in the success of overparameterized neural networks than what has been understood until now. Specifically, by penalizing complicated models independently of their fit to the data, regularization affects training dynamics also far away from optima, making simple models that fit the data well discoverable by local methods, such as SGD.
Minimax Optimal Estimation of Approximate Differential Privacy on Neighboring Databases
Differential privacy has become a widely accepted notion of privacy, leading to the introduction and deployment of numerous privatization mechanisms. However, ensuring the privacy guarantee is an error-prone process, both in designing mechanisms and in implementing those mechanisms. Both types of errors will be greatly reduced, if we have a data-driven approach to verify privacy guarantees, from a black-box access to a mechanism. We pose it as a property estimation problem, and study the fundamental trade-offs involved in the accuracy in estimated privacy guarantees and the number of samples required. We introduce a novel estimator that uses polynomial approximation of a carefully chosen degree to optimally trade-off bias and variance. With n samples, we show that this estimator achieves performance of a straightforward plug-in estimator with n ln n samples, a phenomenon known as sample size amplification. The minimax optimality of the estimator is proved by comparing it to a matching fundamental lower bound.
Supplementary Material for DeWave: Discrete Encoding of EEG Waves for EEG to Text Translation
In this material, we will give more technical details as well as additional experiments to support the main paper. The overview of the proposed framework, DeWave, is illustrated in Figure 6. Ground Bush attended the University of Texas at Austin, where he graduated Phi Beta Kappa with a Truth Bachelor's degree in Latin American Studies in 1973, taking only two and a half years to complete his work, and obtaining generally excellent grades. Predict was the University of California at Austin in where he studied in Beta Kappa in a degree of degree in history American Studies in 1975. ZuCo stands for Zurich Cognitive Language Processing Corpus (ZuCo), a dataset that includes both raw and preprocessed eye-tracking and electroencephalography (EEG) data. The data is collected by having human subjects read given text corpora while simultaneously recording both their eye-tracking signals and EEG waves.