adan
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Xie, Xingyu, Zhou, Pan, Li, Huan, Lin, Zhouchen, Yan, Shuicheng
In deep learning, different kinds of deep networks typically need different optimizers, which have to be chosen after multiple trials, making the training process inefficient. To relieve this issue and consistently improve the model training speed across deep networks, we propose the ADAptive Nesterov momentum algorithm, Adan for short. Adan first reformulates the vanilla Nesterov acceleration to develop a new Nesterov momentum estimation (NME) method, which avoids the extra overhead of computing gradient at the extrapolation point. Then Adan adopts NME to estimate the gradient's first- and second-order moments in adaptive gradient algorithms for convergence acceleration. Besides, we prove that Adan finds an $\epsilon$-approximate first-order stationary point within $O(\epsilon^{-3.5})$ stochastic gradient complexity on the non-convex stochastic problems (e.g., deep learning problems), matching the best-known lower bound. Extensive experimental results show that Adan consistently surpasses the corresponding SoTA optimizers on vision, language, and RL tasks and sets new SoTAs for many popular networks and frameworks, e.g., ResNet, ConvNext, ViT, Swin, MAE, DETR, GPT-2, Transformer-XL, and BERT. More surprisingly, Adan can use half of the training cost (epochs) of SoTA optimizers to achieve higher or comparable performance on ViT, GPT-2, MAE, e.t.c., and also shows great tolerance to a large range of minibatch size, e.g., from 1k to 32k. Code is released at https://github.com/sail-sg/Adan, and has been used in multiple popular deep learning frameworks or projects.
Adan
This complex manufacturing environment is characterized by a large product and batch size variety, numerous parallel machines with large capacity differences, sequence and machine dependent setup times and machine eligibility constraints. A hybrid genetic algorithm is proposed to improve the scheduling process, the main features of which are a local search enhanced crossover mechanism, two additional fast local search procedures and a user-controlled multi-objective fitness function. Testing with real-life production data shows that this multi-objective approach can strike the desired balance between production time, setup time and tardiness, yielding high-quality practically feasible production schedules.
Adversarial Domain Adaptation for Stable Brain-Machine Interfaces
Farshchian, Ali, Gallego, Juan A., Cohen, Joseph P., Bengio, Yoshua, Miller, Lee E., Solla, Sara A.
Brain-Machine Interfaces (BMIs) have recently emerged as a clinically viable option to restore voluntary movements after paralysis. These devices are based on the ability to extract information about movement intent from neural signals recorded using multi-electrode arrays chronically implanted in the motor cortices of the brain. However, the inherent loss and turnover of recorded neurons requires repeated recalibrations of the interface, which can potentially alter the day-to-day user experience. The resulting need for continued user adaptation interferes with the natural, subconscious use of the BMI. Here, we introduce a new computational approach that decodes movement intent from a low-dimensional latent representation of the neural data. We implement various domain adaptation methods to stabilize the interface over significantly long times. This includes Canonical Correlation Analysis used to align the latent variables across days; this method requires prior point-to-point correspondence of the time series across domains. Alternatively, we match the empirical probability distributions of the latent variables across days through the minimization of their Kullback-Leibler divergence. These two methods provide a significant and comparable improvement in the performance of the interface. However, implementation of an Adversarial Domain Adaptation Network trained to match the empirical probability distribution of the residuals of the reconstructed neural signals outperforms the two methods based on latent variables, while requiring remarkably few data points to solve the domain adaptation problem.
Noon in the antilibrary
Marius cursed and jammed a mic stand between the crash bars of the TV studio door. "If SWAT's on its way, we don't have much time," he said. Michaela, who up until a couple of minutes ago had been streaming their interview live, still sat on one of the oval chairs under the hot lights. "What are they talking about?" The cube-shaped television studio had black-painted walls surrounding the bright stage area. Big monitors on the walls were showing the same "live" feed as they had five minutes ago, but now a red banner flashed at the bottom of the screens: ACTIVE SHOOTER AT COMPLETE PICTURES BUILDING. Michaela pointed at a moving figure on the screen. Apparently I like assault rifles." Adan, their cameraman, had called up a local news feed after the first shouts of panic and confusion filtered through the studio's thick doors. What it showed was entirely and completely not what the three of them were seeing. Marius was inside the windowless second-floor studio, empty-handed, yet the monitors showed what looked like a drone feed of him moving into and out of view through the building's windows on the 10th floor. He was armed, and every now and then he would pause and shoot, calmly and methodically. Marius shook his head in disgust. "Hey, Adan, could you give me a hand with this?" The cameraman was hunched over his laptop. "The same people who own the SWAT team," said Marius. "But forget what I said.