AITopics | depthwise

Collaborating Authors

depthwise

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Kim, Dahyun, Park, Chanjun, Kim, Sanghoon, Lee, Wonsung, Song, Wonho, Kim, Yunsu, Kim, Hyeonwoo, Kim, Yungi, Lee, Hyeonju, Kim, Jihoo, Ahn, Changbae, Yang, Seonghoon, Lee, Sukyung, Park, Hyunbyung, Gim, Gyoungjin, Cha, Mikyoung, Lee, Hwalsuk, Kim, Sunghun

arXiv.org Artificial IntelligenceDec-28-2023

We introduce SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Inspired by recent efforts to efficiently up-scale LLMs, we present a method for scaling LLMs called depth up-scaling (DUS), which encompasses depthwise scaling and continued pretraining. In contrast to other LLM up-scaling methods that use mixture-of-experts, DUS does not require complex changes to train and inference efficiently. We show experimentally that DUS is simple yet effective in scaling up high-performance LLMs from small ones. Building on the DUS model, we additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for instruction-following capabilities, surpassing Mixtral-8x7B-Instruct. SOLAR 10.7B is publicly available under the Apache 2.0 license, promoting broad access and application in the LLM field.

arxiv preprint arxiv, dataset, preprint arxiv, (16 more...)

arXiv.org Artificial Intelligence

2312.15166

Country:

Asia > South Korea (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

NeFL: Nested Federated Learning for Heterogeneous Clients

Kang, Honggu, Cha, Seohyeon, Shin, Jinwoo, Lee, Jongmyeong, Kang, Joonhyuk

arXiv.org Artificial IntelligenceOct-9-2023

Federated learning (FL) is a promising approach in distributed learning keeping privacy. System heterogeneity, including heterogeneous computing and network bandwidth, has been addressed to mitigate the impact of stragglers. Previous studies tackle the system heterogeneity by splitting a model into submodels, but with less degreeof-freedom in terms of model architecture. We propose nested federated learning (NeFL), a generalized framework that efficiently divides a model into submodels using both depthwise and widthwise scaling. NeFL is implemented by interpreting forward propagation of models as solving ordinary differential equations (ODEs) with adaptive step sizes. To address the inconsistency that arises when training multiple submodels of different architecture, we decouple a few parameters from parameters being trained for each submodel. NeFL enables resource-constrained clients to effectively join the FL pipeline and the model to be trained with a larger amount of data. Through a series of experiments, we demonstrate that NeFL leads to significant performance gains, especially for the worst-case submodel. Furthermore, we demonstrate NeFL aligns with recent studies in FL, regarding pre-trained models of FL and the statistical heterogeneity. The success of deep learning owes much to vast amounts of training data where a large amount of data comes from mobile devices and internet-of-things (IoT) devices. However, privacy regulations on data collection has become a critical concern, potentially impeding further advancement of deep learning (Dat, 2022; Dou et al., 2021). A distributed machine learning framework, federated learning (FL) is getting attention to address these privacy concerns. FL enables model training by collaboratively leveraging the vast amount of data on clients while preserving data privacy. Rather than centralizing raw data, FL collects trained model weights from clients, that are subsequently aggregated on a server by a method (e.g., FedAvg) (McMahan et al., 2017).

international conference, step size, submodel, (16 more...)

arXiv.org Artificial Intelligence

2308.07761

Country:

North America > United States > Virginia (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Analysing Affective Behavior in the First ABAW 2020 Competition

Kollias, Dimitrios, Schulc, Attila, Hajiyev, Elnar, Zafeiriou, Stefanos

arXiv.org Machine LearningJan-30-2020

Analysing Affective Behavior in the First ABA W 2020 Competition Dimitrios Kollias 1, Attila Schulc 2, Elnar Hajiyev 2 and Stefanos Zafeiriou 1 1 Department of Computing, Imperial College London, UK 2 Realeyes - Emotional Intelligence Abstract -- The Affective Behavior Analysis in-the- wild (ABA W) 2020 Competition is the first Competition aiming at automatic analysis of the three main behavior tasks of valence-arousal estimation, basic expression recognition and action unit detection. It is split into three Challenges, each one addressing a respective behavior task. For the Challenges, we provide a common benchmark database, Aff-Wild2, which is a large scale in-the-wild database and the first one annotated for all these three tasks. In this paper, we describe this Competition, to be held in conjunction with the IEEE Conference on Face and Gesture Recognition, May 2020, in Buenos Aires, Argentina. We present the three Challenges, with the utilized Competition corpora. We outline the evaluation metrics and present the baseline methodologies and the obtained results when these are applied to each Challenge.

aff-wild2, depthwise, recognition, (13 more...)

arXiv.org Machine Learning

2001.11409

Country:

South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.24)
Europe > United Kingdom > England > Greater London > London (0.24)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Design Automation for Efficient Deep Learning Computing

Han, Song, Cai, Han, Zhu, Ligeng, Lin, Ji, Wang, Kuan, Liu, Zhijian, Lin, Yujun

arXiv.org Machine LearningApr-23-2019

Efficient deep learning computing requires algorithm and hardware co-design to enable specialization: we usually need to change the algorithm to reduce memory footprint and improve energy efficiency. However, the extra degree of freedom from the algorithm makes the design space much larger: it's not only about designing the hardware but also about how to tweak the algorithm to best fit the hardware. Human engineers can hardly exhaust the design space by heuristics. It's labor consuming and sub-optimal. We propose design automation techniques for efficient neural networks. We investigate automatically designing specialized fast models, auto channel pruning, and auto mixed-precision quantization. We demonstrate such learning-based, automated design achieves superior performance and efficiency than rule-based human design. Moreover, we shorten the design cycle by 200x than previous work, so that we can afford to design specialized neural network models for different hardware platforms.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

1904.10616

Genre: Research Report (0.50)

Industry: Education > Curriculum > Subject-Specific Education (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback