Blanchet, Jose, Kang, Yang, Zhang, Fan, Hu, Zhangyi

Distributionally Robust Optimization (DRO) has been shown to provide a flexible framework for decision making under uncertainty and statistical estimation. For example, recent works in DRO have shown that popular statistical estimators can be interpreted as the solutions of suitable formulated data-driven DRO problems. In turn, this connection is used to optimally select tuning parameters in terms of a principled approach informed by robustness considerations. This paper contributes to this growing literature, connecting DRO and statistics, by showing how boosting algorithms can be studied via DRO. We propose a boosting type algorithm, named DRO-Boosting, as a procedure to solve our DRO formulation. Our DRO-Boosting algorithm recovers Adaptive Boosting (AdaBoost) in particular, thus showing that AdaBoost is effectively solving a DRO problem. We apply our algorithm to a financial dataset on credit card default payment prediction. We find that our approach compares favorably to alternative boosting methods which are widely used in practice.

Venkatesan, R. C., Plastino, A.

The theoretical basis for a candidate variational principle for the information bottleneck (IB) method is formulated within the ambit of the generalized nonadditive statistics of Tsallis. Given a nonadditivity parameter $ q $, the role of the \textit{additive duality} of nonadditive statistics ($ q^*=2-q $) in relating Tsallis entropies for ranges of the nonadditivity parameter $ q < 1 $ and $ q > 1 $ is described. Defining $ X $, $ \tilde X $, and $ Y $ to be the source alphabet, the compressed reproduction alphabet, and, the \textit{relevance variable} respectively, it is demonstrated that minimization of a generalized IB (gIB) Lagrangian defined in terms of the nonadditivity parameter $ q^* $ self-consistently yields the \textit{nonadditive effective distortion measure} to be the \textit{$ q $-deformed} generalized Kullback-Leibler divergence: $ D_{K-L}^{q}[p(Y|X)||p(Y|\tilde X)] $. This result is achieved without enforcing any \textit{a-priori} assumptions. Next, it is proven that the $q^*-deformed $ nonadditive free energy of the system is non-negative and convex. Finally, the update equations for the gIB method are derived. These results generalize critical features of the IB method to the case of Tsallis statistics.

Data science – A broad umbrella term encompasses data analytics, data mining, machine learning together. Credit to the rapid growth of data, these three sets of professionals have become immensely important to an enterprise. While a data scientist is expected to forecast future trends based on the historical patterns, data analysts extract intelligent insights from various data sources, and machine learning experts build models on data for future prediction and strategy formulation. Data science is a broad concept encapsulating big data it includes data cleansing, preparation, and analysis. A data scientist collates data and applies machine learning, predictive analytics, and sentiment analysis to extract meaningful and intelligent information from the collected data sets.

Mroueh, Youssef, Sercu, Tom, Goel, Vaibhava

We introduce new families of Integral Probability Metrics (IPM) for training Generative Adversarial Networks (GAN). Our IPMs are based on matching statistics of distributions embedded in a finite dimensional feature space. Mean and covariance feature matching IPMs allow for stable training of GANs, which we will call McGan. McGan minimizes a meaningful loss between distributions.

Yan, Junjie, Wan, Ruosi, Zhang, Xiangyu, Zhang, Wei, Wei, Yichen, Sun, Jian

A BSTRACT Batch Normalization (BN) is one of the most widely used techniques in Deep Learning field. This weakness limits the usage of BN on many computer vision tasks like detection or segmentation, where batch size is usually small due to the constraint of memory consumption. Therefore many modified normalization techniques have been proposed, which either fail to restore the performance of BN completely, or have to introduce additional nonlinear operations in inference procedure and increase huge consumption. In this paper, we reveal that there are two extra batch statistics involved in backward propagation of BN, on which has never been well discussed before. The extra batch statistics associated with gradients also can severely affect the training of deep neural network. Based on our analysis, we propose a novel normalization method, named Moving Average Batch Normalization (MABN). MABN can completely restore the performance of vanilla BN in small batch cases, without introducing any additional nonlinear operations in inference procedure. We prove the benefits of MABN by both theoretical analysis and experiments. Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO. It has been widely proven effective in many applications, and become the indispensable part of many state of the art deep models.