adaptive scheme
ADMM without a Fixed Penalty Parameter: Faster Convergence with New Adaptive Penalization
Alternating direction method of multipliers (ADMM) has received tremendous interest for solving numerous problems in machine learning, statistics and signal processing. However, it is known that the performance of ADMM and many of its variants is very sensitive to the penalty parameter of a quadratic penalty applied to the equality constraints. Although several approaches have been proposed for dynamically changing this parameter during the course of optimization, they do not yield theoretical improvement in the convergence rate and are not directly applicable to stochastic ADMM. In this paper, we develop a new ADMM and its linearized variant with a new adaptive scheme to update the penalty parameter. Our methods can be applied under both deterministic and stochastic optimization settings for structured non-smooth objective function. The novelty of the proposed scheme lies at that it is adaptive to a local sharpness property of the objective function, which marks the key difference from previous adaptive scheme that adjusts the penalty parameter per-iteration based on certain conditions on iterates. On theoretical side, given the local sharpness characterized by an exponent $\theta\in(0, 1]$, we show that the proposed ADMM enjoys an improved iteration complexity of $\widetilde O(1/\epsilon^{1-\theta})$\footnote{$\widetilde O()$ suppresses a logarithmic factor.} in the deterministic setting and an iteration complexity of $\widetilde O(1/\epsilon^{2(1-\theta)})$ in the stochastic setting without smoothness and strong convexity assumptions. The complexity in either setting improves that of the standard ADMM which only uses a fixed penalty parameter. On the practical side, we demonstrate that the proposed algorithms converge comparably to, if not much faster than, ADMM with a fine-tuned fixed penalty parameter.
Achieving budget-optimality with adaptive schemes in crowdsourcing
Adaptive schemes, where tasks are assigned based on the data collected thus far, are widely used in practical crowdsourcing systems to efficiently allocate the budget. However, existing theoretical analyses of crowdsourcing systems suggest that the gain of adaptive task assignments is minimal. To bridge this gap, we investigate this question under a strictly more general probabilistic model, which has been recently introduced to model practical crowdsourcing data sets. Under this generalized Dawid-Skene model, we characterize the fundamental trade-off between budget and accuracy, and introduce a novel adaptive scheme that matches this fundamental limit. We further quantify the gain of adaptivity, by comparing the trade-off with the one for non-adaptive schemes, and confirm that the gain is significant and can be made arbitrarily large depending on the distribution of the difficulty level of the tasks at hand.
ADMM without a Fixed Penalty Parameter: Faster Convergence with New Adaptive Penalization
Alternating direction method of multipliers (ADMM) has received tremendous interest for solving numerous problems in machine learning, statistics and signal processing. However, it is known that the performance of ADMM and many of its variants is very sensitive to the penalty parameter of a quadratic penalty applied to the equality constraints. Although several approaches have been proposed for dynamically changing this parameter during the course of optimization, they do not yield theoretical improvement in the convergence rate and are not directly applicable to stochastic ADMM. In this paper, we develop a new ADMM and its linearized variant with a new adaptive scheme to update the penalty parameter. Our methods can be applied under both deterministic and stochastic optimization settings for structured non-smooth objective function. The novelty of the proposed scheme lies at that it is adaptive to a local sharpness property of the objective function, which marks the key difference from previous adaptive scheme that adjusts the penalty parameter per-iteration based on certain conditions on iterates. On theoretical side, given the local sharpness characterized by an exponent $\theta\in(0, 1]$, we show that the proposed ADMM enjoys an improved iteration complexity of $\widetilde O(1/\epsilon^{1-\theta})$\footnote{$\widetilde O()$ suppresses a logarithmic factor.} in the deterministic setting and an iteration complexity of $\widetilde O(1/\epsilon^{2(1-\theta)})$ in the stochastic setting without smoothness and strong convexity assumptions. The complexity in either setting improves that of the standard ADMM which only uses a fixed penalty parameter. On the practical side, we demonstrate that the proposed algorithms converge comparably to, if not much faster than, ADMM with a fine-tuned fixed penalty parameter.
Achieving budget-optimality with adaptive schemes in crowdsourcing
Adaptive schemes, where tasks are assigned based on the data collected thus far, are widely used in practical crowdsourcing systems to efficiently allocate the budget. However, existing theoretical analyses of crowdsourcing systems suggest that the gain of adaptive task assignments is minimal. To bridge this gap, we investigate this question under a strictly more general probabilistic model, which has been recently introduced to model practical crowdsourcing data sets. Under this generalized Dawid-Skene model, we characterize the fundamental trade-off between budget and accuracy, and introduce a novel adaptive scheme that matches this fundamental limit. We further quantify the gain of adaptivity, by comparing the trade-off with the one for non-adaptive schemes, and confirm that the gain is significant and can be made arbitrarily large depending on the distribution of the difficulty level of the tasks at hand.
Reviewer # 1
DQN paper that we applied to all the baselines and our method for the experiment. Therefore, we are certain that we have provided a fair comparison. We apologize for the source of confusion about the update period in Appendix D. We meant "At each " We will correct this to prevent confusion. There have been many recent related works including the ones the Reviewer 1 cited. Unfortunately, we could not provide detailed differences/advances of them all in the limited amount of manuscript.
Adaptive Neural Quantum States: A Recurrent Neural Network Perspective
McNaughton, Jake, Hibat-Allah, Mohamed
Neural-network quantum states (NQS) are powerful neural-network ansätzes that have emerged as promising tools for studying quantum many-body physics through the lens of the variational principle. These architectures are known to be systematically improvable by increasing the number of parameters. Here we demonstrate an Adaptive scheme to optimize NQSs, through the example of recurrent neural networks (RNN), using a fraction of the computation cost while reducing training fluctuations and improving the quality of variational calculations targeting ground states of prototypical models in one- and two-spatial dimensions. This Adaptive technique reduces the computational cost through training small RNNs and reusing them to initialize larger RNNs. This work opens up the possibility for optimizing graphical processing unit (GPU) resources deployed in large-scale NQS simulations.
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- Oceania > Australia (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Reviews: Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization
The paper tightens the analysis of local SGD that periodically averages the models at different nodes. It improves the bound on the communication rounds suffice to achieve linear speedups and relaxes some of the assumptions used in previous works. The authors also develop an adaptive scheme to choose the communication rounds based on the intuition from their theoretical results. They also provided empirical results on logistic regression problem with epsilon dataset to support the theoretical results. Comments: The paper is well written and east to follow.
Reviews: Gradient Dynamics of Shallow Univariate ReLU Networks
In this paper, the authors are using a neural network with one hidden layer and one output layer. The activation function is ReLU, and each hidden unit also has a bias term that can be trained. The input is 1-dimensional, i.e., a scalar, so the task is equivalent to interpolation. The loss function is square loss, and the authors use gradient descent to learn the network weights. The authors use an over-parametrization scheme where the number of nodes in the hidden layer tends to infinity, which means the network is infinitely wide. There are two main learning schemes mentioned in this paper: Kernel and adaptive.
Reviews: An Empirical Study on The Properties of Random Bases for Kernel Methods
Summary: The authors provided an empirical study contrasting neural networks and kernel methods, with a focus on how random and adaptive schemes would make efficient use of features in order to improve quality of learning, at four levels of abstraction: data-agnostic random basis (baseline kernel machines with traditional random features), unsupervised data-adaptive basis for better approximation of kernel function, supervised data-label-adaptive basis by kernel target alignment, discriminatively adaptive basis (neural nets). The paper concluded with several suggestions and caveats for efficient use of random features in practice. Comments: - 1 - Line 123, especially for sake of comparing UAB case where the underlying assumption is that using the true kernel function k in prediction yields the "best" performance so that UAB tries to approximate it, I would suggest testing in experiments a baseline model that utilizes the true kernel function k in prediction. Also this would suggest, for example in Figure 1 at which point of the KAE curve the accuracy is sufficiently good (despite many theoretical results available). However, in order to support those conclusions to a definitively convincing extent, more datasets should be needed. For example, the performance scores in Tab. 1 do not seem to be too significantly different marginally for each task.
Partially Adaptive Multichannel Joint Reduction of Ego-noise and Environmental Noise
Fang, Huajian, Wittmer, Niklas, Twiefel, Johannes, Wermter, Stefan, Gerkmann, Timo
Human-robot interaction relies on a noise-robust audio processing module capable of estimating target speech from audio recordings impacted by environmental noise, as well as self-induced noise, so-called ego-noise. While external ambient noise sources vary from environment to environment, ego-noise is mainly caused by the internal motors and joints of a robot. Ego-noise and environmental noise reduction are often decoupled, i.e., ego-noise reduction is performed without considering environmental noise. Recently, a variational autoencoder (VAE)-based speech model has been combined with a fully adaptive non-negative matrix factorization (NMF) noise model to recover clean speech under different environmental noise disturbances. However, its enhancement performance is limited in adverse acoustic scenarios involving, e.g. ego-noise. In this paper, we propose a multichannel partially adaptive scheme to jointly model ego-noise and environmental noise utilizing the VAE-NMF framework, where we take advantage of spatially and spectrally structured characteristics of ego-noise by pre-training the ego-noise model, while retaining the ability to adapt to unknown environmental noise. Experimental results show that our proposed approach outperforms the methods based on a completely fixed scheme and a fully adaptive scheme when ego-noise and environmental noise are present simultaneously.