Tran-The, Hung
Neural-BO: A Black-box Optimization Algorithm using Deep Neural Networks
Phan-Trong, Dat, Tran-The, Hung, Gupta, Sunil
Bayesian Optimization (BO) is an effective approach for global optimization of black-box functions when function evaluations are expensive. Most prior works use Gaussian processes to model the black-box function, however, the use of kernels in Gaussian processes leads to two problems: first, the kernel-based methods scale poorly with the number of data points and second, kernel methods are usually not effective on complex structured high dimensional data due to curse of dimensionality. Therefore, we propose a novel black-box optimization algorithm where the black-box function is modeled using a neural network. Our algorithm does not need a Bayesian neural network to estimate predictive uncertainty and is therefore computationally favorable. We analyze the theoretical behavior of our algorithm in terms of regret bound using advances in NTK theory showing its efficient convergence. We perform experiments with both synthetic and real-world optimization tasks and show that our algorithm is more sample efficient compared to existing methods.
Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data
Dang, Hien, Tran, Tho, Osher, Stanley, Tran-The, Hung, Ho, Nhat, Nguyen, Tan
Despite the impressive performance of deep neural networks Modern deep neural networks have achieved impressive (DNNs) across areas of machine learning and artificial intelligence performance on tasks from image classification (Krizhevsky et al., 2012; Simonyan & Zisserman, to natural language processing. Surprisingly, 2015; Goodfellow et al., 2016; He et al., 2016b; Huang these complex systems with massive et al., 2017; Brown et al., 2020), the highly non-convex amounts of parameters exhibit the same structural nature of these systems, as well as their massive number of properties in their last-layer features and classifiers parameters, ranging from hundreds of millions to hundreds across canonical datasets when training until of billions, impose a significant barrier to having a concrete convergence. In particular, it has been observed theoretical understanding of how they work. Additionally, a that the last-layer features collapse to their classmeans, variety of optimization algorithms have been developed for and those class-means are the vertices of training DNNs, which makes it more challenging to analyze a simplex Equiangular Tight Frame (ETF). This the resulting trained networks and learned features (Ruder, phenomenon is known as Neural Collapse (N C). 2016). In particular, the modern practice of training DNNs Recent papers have theoretically shown that N C includes training the models far beyond zero error to achieve emerges in the global minimizers of training problems zero loss in the terminal phase of training (TPT) (Ma et al., with the simplified "unconstrained feature 2018; Belkin et al., 2019a;b).
Combining Online Learning and Offline Learning for Contextual Bandits with Deficient Support
Tran-The, Hung, Gupta, Sunil, Nguyen-Tang, Thanh, Rana, Santu, Venkatesh, Svetha
We address policy learning with logged data in contextual bandits. Current offline-policy learning algorithms are mostly based on inverse propensity score (IPS) weighting requiring the logging policy to have \emph{full support} i.e. a non-zero probability for any context/action of the evaluation policy. However, many real-world systems do not guarantee such logging policies, especially when the action space is large and many actions have poor or missing rewards. With such \emph{support deficiency}, the offline learning fails to find optimal policies. We propose a novel approach that uses a hybrid of offline learning with online exploration. The online exploration is used to explore unsupported actions in the logged data whilst offline learning is used to exploit supported actions from the logged data avoiding unnecessary explorations. Our approach determines an optimal policy with theoretical guarantees using the minimal number of online explorations. We demonstrate our algorithms' effectiveness empirically on a diverse collection of datasets.
Bayesian Optimistic Optimisation with Exponentially Decaying Regret
Tran-The, Hung, Gupta, Sunil, Rana, Santu, Venkatesh, Svetha
Bayesian optimisation (BO) is a well-known efficient algorithm for finding the global optimum of expensive, black-box functions. The current practical BO algorithms have regret bounds ranging from $\mathcal{O}(\frac{logN}{\sqrt{N}})$ to $\mathcal O(e^{-\sqrt{N}})$, where $N$ is the number of evaluations. This paper explores the possibility of improving the regret bound in the noiseless setting by intertwining concepts from BO and tree-based optimistic optimisation which are based on partitioning the search space. We propose the BOO algorithm, a first practical approach which can achieve an exponential regret bound with order $\mathcal O(N^{-\sqrt{N}})$ under the assumption that the objective function is sampled from a Gaussian process with a Mat\'ern kernel with smoothness parameter $\nu > 4 +\frac{D}{2}$, where $D$ is the number of dimensions. We perform experiments on optimisation of various synthetic functions and machine learning hyperparameter tuning tasks and show that our algorithm outperforms baselines.
On Finite-Sample Analysis of Offline Reinforcement Learning with Deep ReLU Networks
Nguyen-Tang, Thanh, Gupta, Sunil, Tran-The, Hung, Venkatesh, Svetha
This paper studies the statistical theory of offline reinforcement learning with deep ReLU networks. We consider the off-policy evaluation (OPE) problem where the goal is to estimate the expected discounted reward of a target policy given the logged data generated by unknown behaviour policies. We study a regression-based fitted Q evaluation (FQE) method using deep ReLU networks and characterize a finite-sample bound on the estimation error of this method under mild assumptions. The prior works in OPE with either general function approximation or deep ReLU networks ignore the data-dependent structure in the algorithm, dodging the technical bottleneck of OPE, while requiring a rather restricted regularity assumption. In this work, we overcome these limitations and provide a comprehensive analysis of OPE with deep ReLU networks. In particular, we precisely quantify how the distribution shift of the offline data, the dimension of the input space, and the regularity of the system control the OPE estimation error. Consequently, we provide insights into the interplay between offline reinforcement learning and deep learning.
Sub-linear Regret Bounds for Bayesian Optimisation in Unknown Search Spaces
Tran-The, Hung, Gupta, Sunil, Rana, Santu, Ha, Huong, Venkatesh, Svetha
Bayesian optimisation is a popular method for efficient optimisation of expensive black-box functions. Traditionally, BO assumes that the search space is known. However, in many problems, this assumption does not hold. To this end, we propose a novel BO algorithm which expands (and shifts) the search space over iterations based on controlling the expansion rate thought a hyperharmonic series. Further, we propose another variant of our algorithm that scales to high dimensions. We show theoretically that for both our algorithms, the cumulative regret grows at sub-linear rates. Our experiments with synthetic and real-world optimisation tasks demonstrate the superiority of our algorithms over the current state-of-the-art methods for Bayesian optimisation in unknown search space.
Bayesian Optimization with Unknown Search Space
Ha, Huong, Rana, Santu, Gupta, Sunil, Nguyen, Thanh, Tran-The, Hung, Venkatesh, Svetha
Applying Bayesian optimization in problems wherein the search space is unknown is challenging. To address this problem, we propose a systematic volume expansion strategy for the Bayesian optimization. We devise a strategy to guarantee that in iterative expansions of the search space, our method can find a point whose function value within epsilon of the objective function maximum. Without the need to specify any parameters, our algorithm automatically triggers a minimal expansion required iteratively. We derive analytic expressions for when to trigger the expansion and by how much to expand. We also provide theoretical analysis to show that our method achieves epsilon-accuracy after a finite number of iterations. We demonstrate our method on both benchmark test functions and machine learning hyper-parameter tuning tasks and demonstrate that our method outperforms baselines.