AITopics | large-scale language model

Collaborating Authors

large-scale language model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

COMPACTER: Efficient Low-Rank Hypercomplex Adapter Layers

Neural Information Processing SystemsApr-24-2026, 13:31:16 GMT

Adapting large-scale pretrained language models to downstream tasks via fine-tuning is the standard method for achieving state-of-the-art performance on NLP benchmarks. However, fine-tuning all weights of models with millions or billions of parameters is sample-inefficient, unstable in low-resource settings, and wasteful as it requires storing a separate copy of the model for each task. Recent work has developed parameter-efficient fine-tuning methods, but these approaches either still require a relatively large number of parameters or underperform standard fine-tuning.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Federated Learning-Based Data Collaboration Method for Enhancing Edge Cloud AI System Security Using Large Language Models

Luo, Huaiying, Ji, Cheng

arXiv.org Artificial IntelligenceJun-24-2025

--With the widespread application of edge computing and cloud systems in AI-driven applications, how to maintain efficient performance while ensuring data privacy has become an urgent security issue. This paper proposes a federated learning-based data collaboration method to improve the security of edge cloud AI systems, and use large-scale language models (LLMs) to enhance data privacy protection and system robustness. Based on the existing federated learning framework, this method introduces a secure multi-party computation protocol, which optimizes the data aggregation and encryption process between distributed nodes by using LLM to ensure data privacy and improve system efficiency. By combining advanced adversarial training techniques, the model enhances the resistance of edge cloud AI systems to security threats such as data leakage and model poisoning. Experimental results show that the proposed method is 15% better than the traditional federated learning method in terms of data protection and model robustness.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.18087

Country: North America > United States > Illinois (0.28)

Genre: Research Report (0.85)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

CriticEval: Evaluating Large-scale Language Model as Critic

Neural Information Processing SystemsMay-27-2025, 06:12:51 GMT

Critique ability, i.e., the capability of Large Language Models (LLMs) to identify and rectify flaws in responses, is crucial for their applications in self-improvement and scalable oversight. While numerous studies have been proposed to evaluate critique ability of LLMs, their comprehensiveness and reliability are still limited. To overcome this problem, we introduce CriticEval, a novel benchmark designed to comprehensively and reliably evaluate critique ability of LLMs. Specifically, to ensure the comprehensiveness, CriticEval evaluates critique ability from four dimensions across nine diverse task scenarios. It evaluates both scalar-valued and textual critiques, targeting responses of varying quality.

criticeval, evaluate critique ability, large-scale language model, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Cross-Cloud Data Privacy Protection: Optimizing Collaborative Mechanisms of AI Systems by Integrating Federated Learning and LLMs

Luo, Huaiying, Ji, Cheng

arXiv.org Artificial IntelligenceMay-20-2025

In the age of cloud computing, data privacy protection has become a major challenge, especially when sharing sensitive data across cloud environments. However, how to optimize collaboration across cloud environments remains an unresolved problem. In this paper, we combine federated learning with large-scale language models to optimize the collaborative mechanism of AI systems. Based on the existing federated learning framework, we introduce a cross-cloud architecture in which federated learning works by aggregating model updates from decentralized nodes without exposing the original data. At the same time, combined with large-scale language models, its powerful context and semantic understanding capabilities are used to improve model training efficiency and decision-making ability. We've further innovated by introducing a secure communication layer to ensure the privacy and integrity of model updates and training data. The model enables continuous model adaptation and fine-tuning across different cloud environments while protecting sensitive data. Experimental results show that the proposed method is significantly better than the traditional federated learning model in terms of accuracy, convergence speed and data privacy protection.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.13292

Country: North America > United States > Illinois (0.29)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

Adaptive Optimization for Enhanced Efficiency in Large-Scale Language Model Training

Chen, Jiajing, Liu, Bingying, Liao, Xiaoxuan, Gao, Jia, Zheng, Hongye, Li, Yue

arXiv.org Artificial IntelligenceDec-5-2024

With the rapid development of natural language processing technology, large-scale language models (LLM) have achieved remarkable results in a variety of tasks. However, how to effectively train these huge models and improve their performance and computational efficiency remains an important challenge. This paper proposes an improved method based on adaptive optimization algorithm, aiming to improve the training efficiency and final performance of LLM. Through comparative experiments on the SQuAD and GLUE data sets, the experimental results show that compared with traditional optimization algorithms (such as SGD, Momentum, AdaGrad, RMSProp and Adam), the adaptive optimization algorithm we proposed has better accuracy and F1 score. Both have achieved significant improvements, especially showed stronger training capabilities when processed large-scale texts and complex tasks. The research results verify the advantages of adaptive optimization algorithms in large-scale language model training and provide new ideas and directions for future optimization methods.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.04718

Country:

North America > United States > New York (0.05)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Evaluating the Translation Performance of Large Language Models Based on Euas-20

Huang, Yan, Liu, Wei

arXiv.org Artificial IntelligenceAug-6-2024

In recent years, with the rapid development of deep learning technology, large language models (LLMs) such as BERT and GPT have achieved breakthrough results in natural language processing tasks. Machine translation (MT), as one of the core tasks of natural language processing, has also benefited from the development of large language models and achieved a qualitative leap. Despite the significant progress in translation performance achieved by large language models, machine translation still faces many challenges. Therefore, in this paper, we construct the dataset Euas-20 to evaluate the performance of large language models on translation tasks, the translation ability on different languages, and the effect of pre-training data on the translation ability of LLMs for researchers and developers.

language model, translation, translation performance, (14 more...)

arXiv.org Artificial Intelligence

2408.03119

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
Asia > China > Henan Province > Zhengzhou (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Revolutionizing Bridge Operation and maintenance with LLM-based Agents: An Overview of Applications and Insights

Xinyu-Chen, null, Yanwen-Zhu, null, Yang-Hou, null, Lianzhen-Zhang, null

arXiv.org Artificial IntelligenceJul-13-2024

In various industrial fields of human social development, people have been exploring methods aimed at freeing human labor. Constructing LLM-based agents is considered to be one of the most effective tools to achieve this goal. Agent, as a kind of human-like intelligent entity with the ability of perception, planning, decision-making, and action, has created great production value in many fields. However, the bridge O\&M field shows a relatively low level of intelligence compared to other industries. Nevertheless, the bridge O\&M field has developed numerous intelligent inspection devices, machine learning algorithms, and autonomous evaluation and decision-making methods, which provide a feasible basis for breakthroughs in artificial intelligence in this field. The aim of this study is to explore the impact of AI bodies based on large-scale language models on the field of bridge O\&M and to analyze the potential challenges and opportunities it brings to the core tasks of bridge O\&M. Through in-depth research and analysis, this paper expects to provide a more comprehensive perspective for understanding the application of intelligentsia in this field.

arxiv preprint arxiv, intelligence, language model, (15 more...)

arXiv.org Artificial Intelligence

2407.10064

Country:

Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > Japan (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
(7 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine > Consumer Health (1.00)
Construction & Engineering (1.00)
Materials > Construction Materials (0.93)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks

Mei, Taiyuan, Zi, Yun, Cheng, Xiaohan, Gao, Zijun, Wang, Qi, Yang, Haowei

arXiv.org Artificial IntelligenceMay-19-2024

The internal structure and operation mechanism of large-scale language models are analyzed theoretically, especially how Transformer and its derivative architectures can restrict computing efficiency while capturing long-term dependencies. Further, we dig deep into the efficiency bottleneck of the training phase, and evaluate in detail the contribution of adaptive optimization algorithms (such as AdamW), massively parallel computing techniques, and mixed precision training strategies to accelerate convergence and reduce memory footprint. By analyzing the mathematical principles and implementation details of these algorithms, we reveal how they effectively improve training efficiency in practice. In terms of model deployment and inference optimization, this paper systematically reviews the latest advances in model compression techniques, focusing on strategies such as quantification, pruning, and knowledge distillation. By comparing the theoretical frameworks of these techniques and their effects in different application scenarios, we demonstrate their ability to significantly reduce model size and inference delay while maintaining model prediction accuracy. In addition, this paper critically examines the limitations of current efficiency optimization methods, such as the increased risk of overfitting, the control of performance loss after compression, and the problem of algorithm generality, and proposes some prospects for future research. In conclusion, this study provides a comprehensive theoretical framework for understanding the efficiency optimization of large-scale language models.

arxiv preprint arxiv, distillation, transformer model, (13 more...)

arXiv.org Artificial Intelligence

2405.11704

Country: North America > United States (0.05)

Genre:

Research Report > Experimental Study (0.48)
Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From ChatGPT, DALL-E 3 to Sora: How has Generative AI Changed Digital Humanities Research and Services?

Liu, Jiangfeng, Wang, Ziyi, Xie, Jing, Pei, Lei

arXiv.org Artificial IntelligenceApr-29-2024

Generative large-scale language models create the fifth paradigm of scientific research, organically combine data science and computational intelligence, transform the research paradigm of natural language processing and multimodal information processing, promote the new trend of AI-enabled social science research, and provide new ideas for digital humanities research and application. This article profoundly explores the application of large-scale language models in digital humanities research, revealing their significant potential in ancient book protection, intelligent processing, and academic innovation. The article first outlines the importance of ancient book resources and the necessity of digital preservation, followed by a detailed introduction to developing large-scale language models, such as ChatGPT, and their applications in document management, content understanding, and cross-cultural research. Through specific cases, the article demonstrates how AI can assist in the organization, classification, and content generation of ancient books. Then, it explores the prospects of AI applications in artistic innovation and cultural heritage preservation. Finally, the article explores the challenges and opportunities in the interaction of technology, information, and society in the digital humanities triggered by AI technologies.

ai technology, application, language model, (12 more...)

arXiv.org Artificial Intelligence

2404.18518

Country: Asia > China > Jiangsu Province > Nanjing (0.05)

Genre: Research Report > New Finding (0.46)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models

Tan, Haochen, Guo, Zhijiang, Shi, Zhan, Xu, Lu, Liu, Zhili, Feng, Yunlong, Li, Xiaoguang, Wang, Yasheng, Shang, Lifeng, Liu, Qun, Song, Linqi

arXiv.org Artificial IntelligenceFeb-11-2024

Large Language Models (LLMs) have exhibited remarkable success in long-form context comprehension tasks. However, their capacity to generate long contents, such as reports and articles, remains insufficiently explored. Current benchmarks do not adequately assess LLMs' ability to produce informative and comprehensive content, necessitating a more rigorous evaluation approach. In this study, we introduce \textsc{ProxyQA}, a framework for evaluating long-form text generation, comprising in-depth human-curated \textit{meta-questions} spanning various domains. Each meta-question contains corresponding \textit{proxy-questions} with annotated answers. LLMs are prompted to generate extensive content in response to these meta-questions. Utilizing an evaluator and incorporating generated content as background context, \textsc{ProxyQA} evaluates the quality of generated content based on the evaluator's performance in answering the \textit{proxy-questions}. We examine multiple LLMs, emphasizing \textsc{ProxyQA}'s demanding nature as a high-quality assessment tool. Human evaluation demonstrates that evaluating through \textit{proxy-questions} is a highly self-consistent and human-criteria-correlated validation method. The dataset and leaderboard will be available at \url{https://github.com/Namco0816/ProxyQA}.

computational linguistic, language model, parallelism, (15 more...)

arXiv.org Artificial Intelligence

2401.15042

Country:

Asia > Singapore (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
North America > Canada > Ontario > Toronto (0.04)
(12 more...)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback