shakeout
China's AI 'war of a hundred models' heads for a shakeout
China's craze over generative artificial intelligence has triggered a flurry of product announcements from startups and tech giants on an almost daily basis, but investors are warning a shakeout is imminent as cost and profit pressures grow. The buzz in China, first ignited by the success of OpenAI's ChatGPT almost a year ago, has given rise to what a senior Tencent executive described this month as "war of a hundred models," as it and rivals from Baidu to Alibaba to Huawei promote their offerings. China now has at least 130 large language models (LLMs), accounting for 40% of the global total and just behind the United States' 50% share, according to brokerage CLSA. Additionally, companies have also announced dozens of "industry-specific LLMs" that link to their core model.
Shakeout: A New Approach to Regularized Deep Neural Network Training
Kang, Guoliang, Li, Jun, Tao, Dacheng
Recent years have witnessed the success of deep neural networks in dealing with a plenty of practical problems. Dropout has played an essential role in many successful deep neural networks, by inducing regularization in the model training. In this paper, we present a new regularized training approach: Shakeout. Instead of randomly discarding units as Dropout does at the training stage, Shakeout randomly chooses to enhance or reverse each unit's contribution to the next layer. This minor modification of Dropout has the statistical trait: the regularizer induced by Shakeout adaptively combines $L_0$, $L_1$ and $L_2$ regularization terms. Our classification experiments with representative deep architectures on image datasets MNIST, CIFAR-10 and ImageNet show that Shakeout deals with over-fitting effectively and outperforms Dropout. We empirically demonstrate that Shakeout leads to sparser weights under both unsupervised and supervised settings. Shakeout also leads to the grouping effect of the input units in a layer. Considering the weights in reflecting the importance of connections, Shakeout is superior to Dropout, which is valuable for the deep model compression. Moreover, we demonstrate that Shakeout can effectively reduce the instability of the training process of the deep architecture.
Whiteout: Gaussian Adaptive Noise Regularization in FeedForward Neural Networks
Noise injection (NI) is an approach to mitigate over-fitting in feedforward neural networks (NNs). The Bernoulli NI procedure as implemented in dropout and shakeout has connections with $l_1$ and $l_2$ regularization on the NN model parameters and demonstrates the efficiency and feasibility of NI in regularizing NNs. We propose whiteout, a new NI regularization technique with adaptive Gaussian noise in NNs. Whiteout is more versatile than dropout and shakeout. We show that the optimization objective function associated with whiteout in generalized linear models has a closed-form penalty term that has connections with a wide range of regularization and includes the bridge, lasso, ridge, and elastic net penalization as special cases; it can be also extended to offer regularization similar to the adaptive lasso and group lasso. We prove that whiteout can also be viewed as robust learning of NNs in the presence of small perturbations in input and hidden nodes. We establish that the noise-perturbed empirical loss function with whiteout converges almost surely to the ideal loss function, and the estimates of NN parameters obtained from minimizing the former loss function are consistent with those obtained from minimizing the ideal loss function. Computationally, whiteout can be easily incorporated in the back-propagation algorithm. The superiority of whiteout over dropout and shakeout in learning NNs with relatively small sized training data is demonstrated using the the LSVT voice rehabilitation data and the LIBRAS hand movement data.
Shakeout: A New Regularized Deep Neural Network Training Scheme
Kang, Guoliang (University of Technology Sydney) | Li, Jun (University of Technology Sydney) | Tao, Dacheng (University of Technology Sydney)
Recent years have witnessed the success of deep neural networks in dealing with a plenty of practical problems. The invention of effective training techniques largely contributes to this success. The so-called "Dropout" training scheme is one of the most powerful tool to reduce over-fitting. From the statistic point of view, Dropout works by implicitly imposing an L2 regularizer on the weights. In this paper, we present a new training scheme: Shakeout. Instead of randomly discarding units as Dropout does at the training stage, our method randomly chooses to enhance or inverse the contributions of each unit to the next layer. We show that our scheme leads to a combination of L1 regularization and L2 regularization imposed on the weights, which has been proved effective by the Elastic Net models in practice.We have empirically evaluated the Shakeout scheme and demonstrated that sparse network weights are obtained via Shakeout training. Our classification experiments on real-life image datasets MNIST and CIFAR-10 show that Shakeout deals with over-fitting effectively.