AITopics | pre-training process

Collaborating Authors

pre-training process

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Self-supervised learning on gene expression data

Dradjat, Kevin, Hamidi, Massinissa, Bartet, Pierre, Hanczar, Blaise

arXiv.org Artificial IntelligenceSep-18-2025

Predicting phenotypes from gene expression data is a crucial task in biomedical research, enabling insights into disease mechanisms, drug responses, and personalized medicine. Traditional machine learning and deep learning rely on supervised learning, which requires large quantities of labeled data that are costly and time-consuming to obtain in the case of gene expression data. Self-supervised learning has recently emerged as a promising approach to overcome these limitations by extracting information directly from the structure of unlabeled data. In this study, we investigate the application of state-of-the-art self-supervised learning methods to bulk gene expression data for phenotype prediction. We selected three self-supervised methods, based on different approaches, to assess their ability to exploit the inherent structure of the data and to generate qualitative representations which can be used for downstream predictive tasks. By using several publicly available gene expression datasets, we demonstrate how the selected methods can effectively capture complex information and improve phenotype prediction accuracy. The results obtained show that self-supervised learning methods can outperform traditional supervised models besides offering significant advantage by reducing the dependency on annotated data. We provide a comprehensive analysis of the performance of each method by highlighting their strengths and limitations. We also provide recommendations for using these methods depending on the case under study. Finally, we outline future research directions to enhance the application of self-supervised learning in the field of gene expression data analysis. This study is the first work that deals with bulk RNA-Seq data and self-supervised learning.

artificial intelligence, inductive learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2507.13912

Country: Europe > France (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Binary-Integer-Programming Based Algorithm for Expert Load Balancing in Mixture-of-Experts Models

Sun, Yuan

arXiv.org Artificial IntelligenceMar-20-2025

For pre-training of MoE (Mixture-of-Experts) models, one of the main issues is unbalanced expert loads, which may cause routing collapse or increased computational overhead. Existing methods contain the Loss-Controlled method and the Loss-Free method, where both the unbalanced degrees at first several training steps are still high and decrease slowly. In this work, we propose BIP-Based Balancing, an expert load balancing algorithm based on binary integer programming (BIP). The algorithm maintains an additional vector q on each MoE layer that can help change the top-K order of s by solving a binary integer programming with very small time costs. We implement the algorithm on two MoE language models: 16-expert (0.3B) and 64-expert (1.1B). The experimental results show that on both models comparing with the Loss-Controlled method and the Loss-Free method, our algorithm trains models with the lowest perplexities, while saves at least 13% of pre-training time compared with the Loss-Controlled method. Within our current knowledge, this is the first routing algorithm that achieves maintaining load balance status on every expert in every MoE layer from the first step to the last step during the whole pre-training process, while the trained MoE models also perform well. The code material of this work is available at https://github.com/sunyuanLLM/bip_routing_algorithm.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.15451

Genre:

Research Report (0.71)
Workflow (0.69)

Industry: Energy > Power Industry (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

The Rise and Down of Babel Tower: Investigating the Evolution Process of Multilingual Code Large Language Model

Chen, Jiawei, Chen, Wentao, Su, Jing, Xu, Jingjing, Lin, Hongyu, Ren, Mengjie, Lu, Yaojie, Han, Xianpei, Sun, Le

arXiv.org Artificial IntelligenceDec-10-2024

Large language models (LLMs) have shown significant multilingual capabilities. However, the mechanisms underlying the development of these capabilities during pre-training are not well understood. In this paper, we use code LLMs as an experimental platform to explore the evolution of multilingual capabilities in LLMs during the pre-training process. Based on our observations, we propose the Babel Tower Hypothesis, which describes the entire process of LLMs acquiring new language capabilities. During the learning process, multiple languages initially share a single knowledge system dominated by the primary language and gradually develop language-specific knowledge systems. Experimental results show that the internal state changes of the LLM are consistent with our Babel Tower Hypothesis. Building on these insights, we propose a novel method to construct an optimized pre-training corpus for multilingual code LLMs, which significantly outperforms LLMs trained on the original corpus. The proposed Babel Tower Hypothesis provides new insights into designing pre-training data distributions to achieve optimal multilingual capabilities in LLMs. A united human race speaking a single language migrates to Shinar where they agree to build a great city with a tower that would reach the sky. Yahweh, observing these efforts and remarking on humanity's power in unity, confounds their speech so that they can no longer understand each other and scatters them around the world, leaving the city unfinished.

large language model, machine learning, proportion, (17 more...)

arXiv.org Artificial Intelligence

2412.07298

Country:

Asia > Singapore (0.05)
North America > Mexico > Mexico City > Mexico City (0.04)
Asia > Indonesia > Bali (0.04)
(6 more...)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning

Wang, Xiaolei, Tang, Xinyu, Zhao, Wayne Xin, Wen, Ji-Rong

arXiv.org Artificial IntelligenceJun-20-2024

The emergence of in-context learning (ICL) is potentially attributed to two major abilities: task recognition (TR) for recognizing the task from demonstrations and utilizing pre-trained priors, and task learning (TL) for learning from demonstrations. However, relationships between the two abilities and how such relationships affect the emergence of ICL is unclear. In this paper, we take the first step by examining the pre-training dynamics of the emergence of ICL. With carefully designed metrics, we find that these two abilities are, in fact, competitive during pre-training. Moreover, we observe a strong negative correlation between the competition and ICL performance. Further analysis of common pre-training factors (i.e., model size, dataset size, and data curriculum) demonstrates possible ways to manage the competition. Based on these insights, we propose a simple yet effective method to better integrate these two abilities for ICL at inference time. Through adaptive ensemble learning, the performance of ICL can be significantly boosted, enabling two small models to outperform a larger one with more than twice the parameters. The code is available at https://github.com/RUCAIBox/Competitive-ICL.

competition, icl, tr and tl, (17 more...)

arXiv.org Artificial Intelligence

2406.14022

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

What is BERT?

FOX NewsMay-2-2023, 23:11:30 GMT

Naftali Bennett spoke exclusively with Fox News Digital about the benefits of AI and the need to set parameters for its use now. BERT is an open-source machine learning framework that is used for various natural language processing (NLP) tasks. It is designed to help computers better understand nuance in language by grasping the meaning of surrounding words in a text. The benefit is that context of a text can be understood rather than just the meaning of individual words. It is no secret that artificial intelligence impacts society in surprising ways.

algorithm, bert, google, (13 more...)

FOX News

Country: North America > United States > California > Santa Clara County > Mountain View (0.05)

Industry: Media > News (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.72)

Add feedback

Towards mapping the contemporary art world with ArtLM: an art-specific NLP model

Chen, Qinkai, El-Mennaoui, Mohamed, Fosset, Antoine, Rebei, Amine, Cao, Haoyang, Bouscasse, Philine, O'Beirne, Christy Eóin, Shevchenko, Sasha, Rosenbaum, Mathieu

arXiv.org Artificial IntelligenceDec-22-2022

With an increasing amount of data in the art world, discovering artists and artworks suitable to collectors' tastes becomes a challenge. It is no longer enough to use visual information, as contextual information about the artist has become just as important in contemporary art. In this work, we present a generic Natural Language Processing framework (called ArtLM) to discover the connections among contemporary artists based on their biographies. In this approach, we first continue to pre-train the existing general English language models with a large amount of unlabelled art-related data. We then fine-tune this new pre-trained model with our biography pair dataset manually annotated by a team of professionals in the art industry. With extensive experiments, we demonstrate that our ArtLM achieves 85.6% accuracy and 84.0% F1 score and outperforms other baseline models. We also provide a visualisation and a qualitative analysis of the artist network built from ArtLM's outputs.

artist, machine learning, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

2212.07127

Country:

North America > United States > New York (0.04)
Europe > France (0.04)
Europe > United Kingdom (0.04)
(5 more...)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)

Add feedback

Pre-training, fine-tuning and in-context learning in Large Language Models (LLMs)

#artificialintelligenceAug-6-2022, 05:55:27 GMT

Since the advent of Transformers in 2017, Large Language Models (LLMs) have completely changed the process of training ML models for language tasks. Earlier, for a given task and a given dataset, we used to play around with various models like RNNs, LSTMs, Decision Trees, etc by training each of them on a subset of the data and testing on the rest. And whichever model gave the best accuracy was chosen as the winner. Of course, a lot of model hyper-parameters also needed to be tuned and experimented with. And for many problems, feature engineering was also necessary.

fine-tuning and in-context learning, language model, pre-training process, (11 more...)

#artificialintelligence

Industry: Education (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

DSGPT: Domain-Specific Generative Pre-Training of Transformers for Text Generation in E-commerce Title and Review Summarization

Zhang, Xueying, Jiang, Yunjiang, Shang, Yue, Cheng, Zhaomeng, Zhang, Chi, Fan, Xiaochuan, Xiao, Yun, Long, Bo

arXiv.org Artificial IntelligenceDec-15-2021

We propose a novel domain-specific generative pre-training (DS-GPT) method for text generation and apply it to the product titleand review summarization problems on E-commerce mobile display.First, we adopt a decoder-only transformer architecture, which fitswell for fine-tuning tasks by combining input and output all to-gether. Second, we demonstrate utilizing only small amount of pre-training data in related domains is powerful. Pre-training a languagemodel from a general corpus such as Wikipedia or the CommonCrawl requires tremendous time and resource commitment, andcan be wasteful if the downstream tasks are limited in variety. OurDSGPT is pre-trained on a limited dataset, the Chinese short textsummarization dataset (LCSTS). Third, our model does not requireproduct-related human-labeled data. For title summarization task,the state of art explicitly uses additional background knowledgein training and predicting stages. In contrast, our model implic-itly captures this knowledge and achieves significant improvementover other methods, after fine-tuning on the public Taobao.comdataset. For review summarization task, we utilize JD.com in-housedataset, and observe similar improvement over standard machinetranslation methods which lack the flexibility of fine-tuning. Ourproposed work can be simply extended to other domains for a widerange of text generation tasks.

dataset, summarization, text generation, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3404835.3463037

2112.08414

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Services > e-Commerce Services (0.72)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Improving speech emotion recognition via Transformer-based Predictive Coding through transfer learning

Lian, Zheng, Li, Ya, Tao, Jianhua, Huang, Jian

arXiv.org Machine LearningNov-11-2018

Speech emotion recognition is an important aspect of human-computer interaction. Prior works propose various transfer learning approaches to deal with limited samples in speech emotion recognition. However, they require labeled data for the source task, which cost much effort to collect them. To solve this problem, we focus on the unsupervised task, predictive coding. Nearly unlimited data for most domains can be utilized. In this paper, we utilize the multi-layer Transformer model for the predictive coding, followed with transfer learning approaches to share knowledge of the pre-trained predictive model for speech emotion recognition. We conduct experiments on IEMOCAP, and experimental results reveal the advantages of the proposed method. Our method reaches 65.03% in the weighted accuracy, which also outperforms some currently advanced approaches.

artificial intelligence, emotion recognition, machine learning, (16 more...)

arXiv.org Machine Learning

1811.07691

Country: Asia > China (0.16)

Genre: Research Report (1.00)

Industry: Law > Litigation (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.95)

Add feedback