Optimization
Do Current Multi-Task Optimization Methods in Deep Learning Even Help?
Xin, Derrick, Ghorbani, Behrooz, Garg, Ankush, Firat, Orhan, Gilmer, Justin
Recent research has proposed a series of specialized optimization algorithms for deep multi-task models. It is often claimed that these multi-task optimization (MTO) methods yield solutions that are superior to the ones found by simply optimizing a weighted average of the task losses. In this paper, we perform large-scale experiments on a variety of language and vision tasks to examine the empirical validity of these claims. We show that, despite the added design and computational complexity of these algorithms, MTO methods do not yield any performance improvements beyond what is achievable via traditional optimization approaches. We highlight alternative strategies that consistently yield improvements to the performance profile and point out common training pitfalls that might cause suboptimal results. Finally, we outline challenges in reliably evaluating the performance of MTO algorithms and discuss potential solutions.
Optimization with Constraint Learning: A Framework and Survey
Fajemisin, Adejuyigbe, Maragno, Donato, Hertog, Dick den
Many real-life optimization problems frequently contain one or more constraints or objectives for which there are no explicit formulas. If data is however available, these data can be used to learn the constraints. The benefits of this approach are clearly seen, however there is a need for this process to be carried out in a structured manner. This paper therefore provides a framework for Optimization with Constraint Learning (OCL) which we believe will help to formalize and direct the process of learning constraints from data. This framework includes the following steps: (i) setup of the conceptual optimization model, (ii) data gathering and preprocessing, (iii) selection and training of predictive models, (iv) resolution of the optimization model, and (v) verification and improvement of the optimization model. We then review the recent OCL literature in light of this framework, and highlight current trends, as well as areas for future research.
IGN : Implicit Generative Networks
Luo, Haozheng, Wu, Tianyi, Han, Colin Feiyu, Yan, Zhijun
In this work, we build recent advances in distributional reinforcement learning to give a state-of-art distributional variant of the model based on the IQN. We achieve this by using the GAN model's generator and discriminator function with the quantile regression to approximate the full quantile value for the state-action return distribution. We demonstrate improved performance on our baseline dataset - 57 Atari 2600 games in the ALE. Also, we use our algorithm to show the state-of-art training performance of risk-sensitive policies in Atari games with the policy optimization and evaluation.
Leveraging Joint-Diagonalization in Transform-Learning NMF
Zhang, Sixin, Soubies, Emmanuel, Fรฉvotte, Cรฉdric
Non-negative matrix factorization with transform learning (TL-NMF) is a recent idea that aims at learning data representations suited to NMF. In this work, we relate TL-NMF to the classical matrix joint-diagonalization (JD) problem. We show that, when the number of data realizations is sufficiently large, TL-NMF can be replaced by a two-step approach -- termed as JD+NMF -- that estimates the transform through JD, prior to NMF computation. In contrast, we found that when the number of data realizations is limited, not only is JD+NMF no longer equivalent to TL-NMF, but the inherent low-rank constraint of TL-NMF turns out to be an essential ingredient to learn meaningful transforms for NMF.
Unsupervised Alignment of Distributional Word Embeddings
Diallo, Aissatou, Fรผrnkranz, Johannes
Cross-domain alignment plays a key role in tasks ranging from image-text retrieval to machine translation. The main objective is to associate related entities across different domains. Recently, purely unsupervised methods operating on monolingual embeddings have successfully been used to infer a bilingual lexicon without relying on supervision. However, current state-of-the art methods only focus on point vectors although distributional embeddings have proven to embed richer semantic information when representing words. This paper investigates a novel stochastic optimization approach for aligning word distributional embeddings. Our method builds upon techniques in optimal transport to resolve the cross-domain matching problem in a principled manner. We evaluate our method on the problem of unsupervised word translation, by aligning word embeddings trained on monolingual data. We present empirical evidence to demonstrate the validity of our approach to the bilingual lexicon induction task across several language pairs.
In Differential Privacy, There is Truth: On Vote Leakage in Ensemble Private Learning
Wang, Jiaqi, Schuster, Roei, Shumailov, Ilia, Lie, David, Papernot, Nicolas
When learning from sensitive data, care must be taken to ensure that training algorithms address privacy concerns. The canonical Private Aggregation of Teacher Ensembles, or PATE, computes output labels by aggregating the predictions of a (possibly distributed) collection of teacher models via a voting mechanism. The mechanism adds noise to attain a differential privacy guarantee with respect to the teachers' training data. In this work, we observe that this use of noise, which makes PATE predictions stochastic, enables new forms of leakage of sensitive information. For a given input, our adversary exploits this stochasticity to extract high-fidelity histograms of the votes submitted by the underlying teachers. From these histograms, the adversary can learn sensitive attributes of the input such as race, gender, or age. Although this attack does not directly violate the differential privacy guarantee, it clearly violates privacy norms and expectations, and would not be possible at all without the noise inserted to obtain differential privacy. In fact, counter-intuitively, the attack becomes easier as we add more noise to provide stronger differential privacy. We hope this encourages future work to consider privacy holistically rather than treat differential privacy as a panacea.
A Validation Approach to Over-parameterized Matrix and Image Recovery
Ding, Lijun, Qin, Zhen, Jiang, Liwei, Zhou, Jinxin, Zhu, Zhihui
In this paper, we study the problem of recovering a low-rank matrix from a number of noisy random linear measurements. We consider the setting where the rank of the ground-truth matrix is unknown a prior and use an overspecified factored representation of the matrix variable, where the global optimal solutions overfit and do not correspond to the underlying ground-truth. We then solve the associated nonconvex problem using gradient descent with small random initialization. We show that as long as the measurement operators satisfy the restricted isometry property (RIP) with its rank parameter scaling with the rank of ground-truth matrix rather than scaling with the overspecified matrix variable, gradient descent iterations are on a particular trajectory towards the ground-truth matrix and achieve nearly information-theoretically optimal recovery when stop appropriately. We then propose an efficient early stopping strategy based on the common hold-out method and show that it detects nearly optimal estimator provably. Moreover, experiments show that the proposed validation approach can also be efficiently used for image restoration with deep image prior which over-parameterizes an image with a deep network.
Efficient inspection of underground galleries using k robots with limited energy
Bereg, Sergey, Caraballo, L. Evaristo, Dรญaz-Bรกรฑez, Josรฉ Miguel
We study the problem of optimally inspecting an underground (underwater) gallery with k agents. We consider a gallery with a single opening and with a tree topology rooted at the opening. Due to the small diameter of the pipes (caves), the agents are small robots with limited autonomy and there is a supply station at the gallery's opening. Therefore, they are initially placed at the root and periodically need to return to the supply station. Our goal is to design off-line strategies to efficiently cover the tree with $k$ small robots. We consider two objective functions: the covering time (maximum collective time) and the covering distance (total traveled distance). The maximum collective time is the maximum time spent by a robot needs to finish its assigned task (assuming that all the robots start at the same time); the total traveled distance is the sum of the lengths of all the covering walks. Since the problems are intractable for big trees, we propose approximation algorithms. Both efficiency and accuracy of the suboptimal solutions are empirically showed for random trees through intensive numerical experiments.
Social-Inverse: Inverse Decision-making of Social Contagion Management with Task Migrations
Considering two decision-making tasks $A$ and $B$, each of which wishes to compute an effective \textit{decision} $Y$ for a given \textit{query} $X$, {can we solve task $B$ by using query-decision pairs $(X, Y)$ of $A$ without knowing the latent decision-making model?} Such problems, called \textit{inverse decision-making with task migrations}, are of interest in that the complex and stochastic nature of real-world applications often prevents the agent from completely knowing the underlying system. In this paper, we introduce such a new problem with formal formulations and present a generic framework for addressing decision-making tasks in social contagion management. On the theory side, we present a generalization analysis for justifying the learning performance of our framework. In empirical studies, we perform a sanity check and compare the presented method with other possible learning-based and graph-based methods. We have acquired promising experimental results, confirming for the first time that it is possible to solve one decision-making task by using the solutions associated with another one.
SoLar: Sinkhorn Label Refinery for Imbalanced Partial-Label Learning
Wang, Haobo, Xia, Mingxuan, Li, Yixuan, Mao, Yuren, Feng, Lei, Chen, Gang, Zhao, Junbo
Partial-label learning (PLL) is a peculiar weakly-supervised learning task where the training samples are generally associated with a set of candidate labels instead of single ground truth. While a variety of label disambiguation methods have been proposed in this domain, they normally assume a class-balanced scenario that may not hold in many real-world applications. Empirically, we observe degenerated performance of the prior methods when facing the combinatorial challenge from the long-tailed distribution and partial-labeling. In this work, we first identify the major reasons that the prior work failed. We subsequently propose SoLar, a novel Optimal Transport-based framework that allows to refine the disambiguated labels towards matching the marginal class prior distribution. SoLar additionally incorporates a new and systematic mechanism for estimating the long-tailed class prior distribution under the PLL setup. Through extensive experiments, SoLar exhibits substantially superior results on standardized benchmarks compared to the previous state-of-the-art PLL methods. Code and data are available at: https://github.com/hbzju/SoLar .