depthwise
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Kim, Dahyun, Park, Chanjun, Kim, Sanghoon, Lee, Wonsung, Song, Wonho, Kim, Yunsu, Kim, Hyeonwoo, Kim, Yungi, Lee, Hyeonju, Kim, Jihoo, Ahn, Changbae, Yang, Seonghoon, Lee, Sukyung, Park, Hyunbyung, Gim, Gyoungjin, Cha, Mikyoung, Lee, Hwalsuk, Kim, Sunghun
We introduce SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Inspired by recent efforts to efficiently up-scale LLMs, we present a method for scaling LLMs called depth up-scaling (DUS), which encompasses depthwise scaling and continued pretraining. In contrast to other LLM up-scaling methods that use mixture-of-experts, DUS does not require complex changes to train and inference efficiently. We show experimentally that DUS is simple yet effective in scaling up high-performance LLMs from small ones. Building on the DUS model, we additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for instruction-following capabilities, surpassing Mixtral-8x7B-Instruct. SOLAR 10.7B is publicly available under the Apache 2.0 license, promoting broad access and application in the LLM field.
NeFL: Nested Federated Learning for Heterogeneous Clients
Kang, Honggu, Cha, Seohyeon, Shin, Jinwoo, Lee, Jongmyeong, Kang, Joonhyuk
Federated learning (FL) is a promising approach in distributed learning keeping privacy. System heterogeneity, including heterogeneous computing and network bandwidth, has been addressed to mitigate the impact of stragglers. Previous studies tackle the system heterogeneity by splitting a model into submodels, but with less degreeof-freedom in terms of model architecture. We propose nested federated learning (NeFL), a generalized framework that efficiently divides a model into submodels using both depthwise and widthwise scaling. NeFL is implemented by interpreting forward propagation of models as solving ordinary differential equations (ODEs) with adaptive step sizes. To address the inconsistency that arises when training multiple submodels of different architecture, we decouple a few parameters from parameters being trained for each submodel. NeFL enables resource-constrained clients to effectively join the FL pipeline and the model to be trained with a larger amount of data. Through a series of experiments, we demonstrate that NeFL leads to significant performance gains, especially for the worst-case submodel. Furthermore, we demonstrate NeFL aligns with recent studies in FL, regarding pre-trained models of FL and the statistical heterogeneity. The success of deep learning owes much to vast amounts of training data where a large amount of data comes from mobile devices and internet-of-things (IoT) devices. However, privacy regulations on data collection has become a critical concern, potentially impeding further advancement of deep learning (Dat, 2022; Dou et al., 2021). A distributed machine learning framework, federated learning (FL) is getting attention to address these privacy concerns. FL enables model training by collaboratively leveraging the vast amount of data on clients while preserving data privacy. Rather than centralizing raw data, FL collects trained model weights from clients, that are subsequently aggregated on a server by a method (e.g., FedAvg) (McMahan et al., 2017).
Analysing Affective Behavior in the First ABAW 2020 Competition
Kollias, Dimitrios, Schulc, Attila, Hajiyev, Elnar, Zafeiriou, Stefanos
Analysing Affective Behavior in the First ABA W 2020 Competition Dimitrios Kollias 1, Attila Schulc 2, Elnar Hajiyev 2 and Stefanos Zafeiriou 1 1 Department of Computing, Imperial College London, UK 2 Realeyes - Emotional Intelligence Abstract -- The Affective Behavior Analysis in-the- wild (ABA W) 2020 Competition is the first Competition aiming at automatic analysis of the three main behavior tasks of valence-arousal estimation, basic expression recognition and action unit detection. It is split into three Challenges, each one addressing a respective behavior task. For the Challenges, we provide a common benchmark database, Aff-Wild2, which is a large scale in-the-wild database and the first one annotated for all these three tasks. In this paper, we describe this Competition, to be held in conjunction with the IEEE Conference on Face and Gesture Recognition, May 2020, in Buenos Aires, Argentina. We present the three Challenges, with the utilized Competition corpora. We outline the evaluation metrics and present the baseline methodologies and the obtained results when these are applied to each Challenge.
Design Automation for Efficient Deep Learning Computing
Han, Song, Cai, Han, Zhu, Ligeng, Lin, Ji, Wang, Kuan, Liu, Zhijian, Lin, Yujun
Efficient deep learning computing requires algorithm and hardware co-design to enable specialization: we usually need to change the algorithm to reduce memory footprint and improve energy efficiency. However, the extra degree of freedom from the algorithm makes the design space much larger: it's not only about designing the hardware but also about how to tweak the algorithm to best fit the hardware. Human engineers can hardly exhaust the design space by heuristics. It's labor consuming and sub-optimal. We propose design automation techniques for efficient neural networks. We investigate automatically designing specialized fast models, auto channel pruning, and auto mixed-precision quantization. We demonstrate such learning-based, automated design achieves superior performance and efficiency than rule-based human design. Moreover, we shorten the design cycle by 200x than previous work, so that we can afford to design specialized neural network models for different hardware platforms.