AITopics | prolong

Collaborating Authors

prolong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LongAttn: Selecting Long-context Training Data via Token-level Attention

Wu, Longyun, Zhu, Dawei, Zhao, Guangxiang, Yu, Zhuocheng, Ran, Junfeng, Wong, Xiangyu, Sun, Lin, Li, Sujian

arXiv.org Artificial IntelligenceFeb-27-2025

With the development of large language models (LLMs), there has been an increasing need for significant advancements in handling long contexts. To enhance long-context capabilities, constructing high-quality training data with long-range dependencies is crucial. Existing methods to select long-context data often rely on sentence-level analysis, which can be greatly optimized in both performance and efficiency. In this paper, we propose a novel token-level framework, LongAttn, which leverages the self-attention mechanism of LLMs to measure the long-range dependencies for the data. By calculating token-level dependency strength and distribution uniformity of token scores, LongAttn effectively quantifies long-range dependencies, enabling more accurate and efficient data selection. We filter LongABC-32K from open-source long-context datasets (ArXiv, Book, and Code). Through our comprehensive experiments, LongAttn has demonstrated its excellent effectiveness, scalability, and efficiency. To facilitate future research in long-context data, we released our code and the high-quality long-context training data LongABC-32K.

arxiv preprint arxiv, dependency, longattn, (15 more...)

arXiv.org Artificial Intelligence

2502.1686

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)

Add feedback

How to Train Long-Context Language Models (Effectively)

Gao, Tianyu, Wettig, Alexander, Yen, Howard, Chen, Danqi

arXiv.org Artificial IntelligenceOct-3-2024

We study continued training and supervised fine-tuning (SFT) of a language model (LM) to make effective use of long-context information. We first establish a reliable evaluation protocol to guide model development-- instead of perplexity or simple needle-in-a-haystack (NIAH) tests, we use a broad set of long-context tasks, and we evaluate models after SFT with instruction data as this better reveals long-context abilities. Supported by our robust evaluations, we run thorough experiments to decide the data mix for continued pre-training, the instruction tuning dataset, and many other design choices. We find that (1) code repositories and books are excellent sources of long data, but it is crucial to combine them with high-quality short data; (2) training with a sequence length beyond the evaluation length boosts long-context performance; (3) for SFT, using only short instruction datasets yields strong performance on long-context tasks. Our final model, ProLong-8B, which is initialized from Llama-3 and trained on 40B tokens, demonstrates state-of-the-art long-context performance among similarly sized models at a length of 128K. ProLong outperforms Llama-3.1-8B-Instruct on the majority of long-context tasks despite having seen only 5% as many tokens during long-context training. Additionally, ProLong can effectively process up to 512K tokens, one of the longest context windows of publicly available LMs.

arxiv preprint arxiv, computational linguistic, train long-context language model, (11 more...)

arXiv.org Artificial Intelligence

2410.0266

Country:

Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)
Asia > Middle East > Jordan (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (0.67)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models

Chen, Longze, Liu, Ziqiang, He, Wanwei, Li, Yunshui, Luo, Run, Yang, Min

arXiv.org Artificial IntelligenceMay-28-2024

Long-context modeling capabilities are important for large language models (LLMs) in various applications. However, directly training LLMs with long context windows is insufficient to enhance this capability since some training samples do not exhibit strong semantic dependencies across long contexts. In this study, we propose a data mining framework \textbf{ProLong} that can assign each training sample with a long dependency score, which can be used to rank and filter samples that are more advantageous for enhancing long-context modeling abilities in LLM training. Specifically, we first use delta perplexity scores to measure the \textit{Dependency Strength} between text segments in a given document. Then we refine this metric based on the \textit{Dependency Distance} of these segments to incorporate spatial relationships across long-contexts. Final results are calibrated with a \textit{Dependency Specificity} metric to prevent trivial dependencies introduced by repetitive patterns. Moreover, a random sampling approach is proposed to optimize the computational efficiency of ProLong. Comprehensive experiments on multiple benchmarks indicate that ProLong effectively identifies documents that carry long dependencies and LLMs trained on these documents exhibit significantly enhanced long-context modeling capabilities.

arxiv preprint arxiv, dependency, prolong, (14 more...)

arXiv.org Artificial Intelligence

2405.17915

Country:

Asia > Vietnam > Long An Province (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Catch and Prolong: recurrent neural network for seeking track-candidates

Baranov, Dmitriy, Ososkov, Gennady, Goncharov, Pavel, Tsytrinov, Andrei

arXiv.org Machine LearningNov-14-2018

One of the most important problems of data processing in high energy and nuclear physics is the event reconstruction. Its main part is the track reconstruction procedure which consists in looking for all tracks that elementary particles leave when they pass through a detector among a huge number of points, so-called hits, produced when flying particles fire detector coordinate planes. Unfortunately, the tracking is seriously impeded by the famous shortcoming of multiwired, strip and GEM detectors due to appearance in them a lot of fake hits caused by extra spurious crossings of fired strips. Since the number of those fakes is several orders of magnitude greater than for true hits, one faces with the quite serious difficulty to unravel possible track-candidates via true hits ignoring fakes. We introduce a renewed method that is a significant improvement of our previous two-stage approach based on hit preprocessing using directed K-d tree search followed a deep neural classifier. We combine these two stages in one by applying recurrent neural network that simultaneously determines whether a set of points belongs to a true track or not and predicts where to look for the next point of track on the next coordinate plane of the detector. We show that proposed deep network is more accurate, faster and does not require any special preprocessing stage. Preliminary results of our approach for simulated events of the BM@N GEM detector are presented.

artificial intelligence, machine learning, recurrent neural network, (18 more...)

arXiv.org Machine Learning

1811.06002

Country:

Europe > Belarus (0.15)
Europe > Russia (0.14)
Europe > Montenegro (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback