AITopics

2409.04269

Country:

North America > United States > Florida (0.04)
Europe > Belgium (0.04)
Asia > Uzbekistan > Karakalpakstan (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Heo, DongNyeong, Rim, Daniela Noemi, Choi, Heeyoul

N-gram Prediction and Word Difference Representations for Language Modeling

arXiv.org Artificial IntelligenceSep-5-2024

Causal language modeling (CLM) serves as the foundational framework underpinning remarkable successes of recent large language models (LLMs). Despite its success, the training approach for next word prediction poses a potential risk of causing the model to overly focus on local dependencies within a sentence. While prior studies have been introduced to predict future N words simultaneously, they were primarily applied to tasks such as masked language modeling (MLM) and neural machine translation (NMT). In this study, we introduce a simple N-gram prediction framework for the CLM task. Moreover, we introduce word difference representation (WDR) as a surrogate and contextualized target representation during model training on the basis of N-gram prediction framework. To further enhance the quality of next word prediction, we propose an ensemble method that incorporates the future N words' prediction results. Empirical evaluations across multiple benchmark datasets encompassing CLM and NMT tasks demonstrate the significant advantages of our proposed methods over the conventional CLM.

large language model, machine learning, natural language, (20 more...)

2409.03295

Country:

Asia > South Korea (0.04)
Europe > Belgium (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-5-2024

Painful intelligence: What AI can tell us about human suffering

Hyvärinen, Aapo

This book uses the modern theory of artificial intelligence (AI) to understand human suffering or mental pain. Both humans and sophisticated AI agents process information about the world in order to achieve goals and obtain rewards, which is why AI can be used as a model of the human brain and mind. This book intends to make the theory accessible to a relatively general audience, requiring only some relevant scientific background. The book starts with the assumption that suffering is mainly caused by frustration. Frustration means the failure of an agent (whether AI or human) to achieve a goal or a reward it wanted or expected. Frustration is inevitable because of the overwhelming complexity of the world, limited computational resources, and scarcity of good data. In particular, such limitations imply that an agent acting in the real world must cope with uncontrollability, unpredictability, and uncertainty, which all lead to frustration. Fundamental in such modelling is the idea of learning, or adaptation to the environment. While AI uses machine learning, humans and animals adapt by a combination of evolutionary mechanisms and ordinary learning. Even frustration is fundamentally an error signal that the system uses for learning. This book explores various aspects and limitations of learning algorithms and their implications regarding suffering. At the end of the book, the computational theory is used to derive various interventions or training methods that will reduce suffering in humans. The amount of frustration is expressed by a simple equation which indicates how it can be reduced. The ensuing interventions are very similar to those proposed by Buddhist and Stoic philosophy, and include mindfulness meditation. Therefore, this book can be interpreted as an exposition of a computational theory justifying why such philosophies and meditation reduce human suffering.

buddhist meditation manual visuddhimagga, pattern analysis and machine intelligence, unpredictability and uncontrollability, (15 more...)

2205.15409

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.13)
North America > United States > New York (0.04)
(14 more...)

Genre:

Workflow (1.00)
Summary/Review (1.00)
Research Report (1.00)
(4 more...)

Industry:

Transportation (1.00)
Media (1.00)
Leisure & Entertainment > Games (1.00)
(13 more...)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(16 more...)

arXiv.org Artificial IntelligenceSep-4-2024

A Comparative Study of Pre-training and Self-training

Wang, Yiheng, Lin, Jiayu, Lin, Zuoquan

Pre-training and self-training are two approaches to semi-supervised learning. The comparison between pre-training and self-training has been explored. However, the previous works led to confusing findings: self-training outperforms pre-training experienced on some tasks in computer vision, and contrarily, pre-training outperforms self-training experienced on some tasks in natural language processing, under certain conditions of incomparable settings. We propose, comparatively and exhaustively, an ensemble method to empirical study all feasible training paradigms combining pre-training, self-training, and fine-tuning within consistent foundational settings comparable to data augmentation. We conduct experiments on six datasets, four data augmentation, and imbalanced data for sentiment analysis and natural language inference tasks. Our findings confirm that the pre-training and fine-tuning paradigm yields the best overall performances. Moreover, self-training offers no additional benefits when combined with semi-supervised pre-training.

large language model, machine learning, paradigm, (16 more...)

2409.02751

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(7 more...)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Artificial IntelligenceSep-4-2024

Creating Domain-Specific Translation Memories for Machine Translation Fine-tuning: The TRENCARD Bilingual Cardiology Corpus

Dogru, Gokhan

This article investigates how translation memories (TM) can be created by translators or other language professionals in order to compile domain-specific parallel corpora , which can then be used in different scenarios, such as machine translation training and fine-tuning, TM leveraging, and/or large language model fine-tuning. The article introduces a semi-automatic TM preparation methodology leveraging primarily translation tools used by translators in favor of data quality and control by the translators. This semi-automatic methodology is then used to build a cardiology-based Turkish -> English corpus from bilingual abstracts of Turkish cardiology journals. The resulting corpus called TRENCARD Corpus has approximately 800,000 source words and 50,000 sentences. Using this methodology, translators can build their custom TMs in a reasonable time and use them in their bilingual data requiring tasks.

translation, translator, website, (15 more...)

2409.02667

Country:

Asia > Middle East > Republic of Türkiye (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Kowtal, Nidhi, Deshpande, Tejas, Joshi, Raviraj

A Data Selection Approach for Enhancing Low Resource Machine Translation Using Cross-Lingual Sentence Representations

arXiv.org Artificial IntelligenceSep-4-2024

Machine translation in low-resource language pairs faces significant challenges due to the scarcity of parallel corpora and linguistic resources. This study focuses on the case of English-Marathi language pairs, where existing datasets are notably noisy, impeding the performance of machine translation models. To mitigate the impact of data quality issues, we propose a data filtering approach based on cross-lingual sentence representations. Our methodology leverages a multilingual SBERT model to filter out problematic translations in the training data. Specifically, we employ an IndicSBERT similarity model to assess the semantic equivalence between original and translated sentences, allowing us to retain linguistically correct translations while discarding instances with substantial deviations. The results demonstrate a significant improvement in translation quality over the baseline post-filtering with IndicSBERT. This illustrates how cross-lingual sentence representations can reduce errors in machine translation scenarios with limited resources. By integrating multilingual sentence BERT models into the translation pipeline, this research contributes to advancing machine translation techniques in low-resource environments. The proposed method not only addresses the challenges in English-Marathi language pairs but also provides a valuable framework for enhancing translation quality in other low-resource language translation tasks.

dataset, machine translation, translation, (13 more...)

doi: 10.1109/I2CT61223.2024.10544178

2409.02712

Country: Asia > India > Maharashtra > Pune (0.05)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Shakil, Hassan, Farooq, Ahmad, Kalita, Jugal

Abstractive Text Summarization: State of the Art, Challenges, and Improvements

Specifically focusing on the landscape of abstractive text summarization, as opposed to extractive techniques, this survey presents a comprehensive overview, delving into state-of-the-art techniques, prevailing challenges, and prospective research directions. We categorize the techniques into traditional sequence-to-sequence models, pre-trained large language models, reinforcement learning, hierarchical methods, and multi-modal summarization. Unlike prior works that did not examine complexities, scalability and comparisons of techniques in detail, this review takes a comprehensive approach encompassing state-of-the-art methods, challenges, solutions, comparisons, limitations and charts out future improvements - providing researchers an extensive overview to advance abstractive summarization research. We provide vital comparison tables across techniques categorized - offering insights into model complexity, scalability and appropriate applications. The paper highlights challenges such as inadequate meaning representation, factual consistency, controllable text summarization, cross-lingual summarization, and evaluation metrics, among others. Solutions leveraging knowledge incorporation and other innovative strategies are proposed to address these challenges. The paper concludes by highlighting emerging research areas like factual inconsistency, domain-specific, cross-lingual, multilingual, and long-document summarization, as well as handling noisy data. Our objective is to provide researchers and practitioners with a structured overview of the domain, enabling them to better understand the current landscape and identify potential areas for further research and improvement.

abstractive text summarization, summarization, text summarization, (13 more...)

doi: 10.1016/j.neucom.2024.128255

2409.02413

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > India > NCT > New Delhi (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(13 more...)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Overview (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(5 more...)

Tonja, Atnafu Lambebo, Dossou, Bonaventure F. P., Ojo, Jessica, Rajab, Jenalea, Thior, Fadel, Wairagala, Eric Peter, Aremu, Anuoluwapo, Moiloa, Pelonomi, Abbott, Jade, Marivate, Vukosi, Rosman, Benjamin

InkubaLM: A small language model for low-resource African languages

High-resource language models often fall short in the African context, where there is a critical need for models that are efficient, accessible, and locally relevant, even amidst significant computing and data constraints. This paper introduces InkubaLM, a small language model with 0.4 billion parameters, which achieves performance comparable to models with significantly larger parameter counts and more extensive training data on tasks such as machine translation, question-answering, AfriMMLU, and the AfriXnli task. Notably, InkubaLM outperforms many larger models in sentiment analysis and demonstrates remarkable consistency across multiple languages. This work represents a pivotal advancement in challenging the conventional paradigm that effective language models must rely on substantial resources. Our model and datasets are publicly available at https://huggingface.co/lelapa to encourage research and development on low-resource languages.

african language, dataset, inkubalm-0, (13 more...)

2408.17024

Country:

North America > United States (0.06)
Asia > Middle East > Israel (0.05)
Africa > Niger (0.04)
(22 more...)

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)

Vashishth, Shikhar, Singh, Harman, Bharadwaj, Shikhar, Ganapathy, Sriram, Asawaroengchai, Chulayuth, Audhkhasi, Kartik, Rosenberg, Andrew, Bapna, Ankur, Ramabhadran, Bhuvana

STAB: Speech Tokenizer Assessment Benchmark

Representing speech as discrete tokens provides a framework for transforming speech into a format that closely resembles text, thus enabling the use of speech as an input to the widely successful large language models (LLMs). Currently, while several speech tokenizers have been proposed, there is ambiguity regarding the properties that are desired from a tokenizer for specific downstream tasks and its overall generalizability. Evaluating the performance of tokenizers across different downstream tasks is a computationally intensive effort that poses challenges for scalability. To circumvent this requirement, we present STAB (Speech Tokenizer Assessment Benchmark), a systematic evaluation framework designed to assess speech tokenizers comprehensively and shed light on their inherent characteristics. This framework provides a deeper understanding of the underlying mechanisms of speech tokenization, thereby offering a valuable resource for expediting the advancement of future tokenizer models and enabling comparative analysis using a standardized benchmark. We evaluate the STAB metrics and correlate this with downstream task performance across a range of speech tasks and tokenizer choices.

downstream task, representation, tokenizer, (15 more...)

2409.02384

Country:

North America > United States (0.14)
Europe > Germany > Berlin (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.97)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)

Sible, Kenneth J., Chiang, David

Improving Rare Word Translation With Dictionaries and Attention Masking

In machine translation, rare words continue to be a problem for the dominant encoder-decoder architecture, especially in low-resource and out-of-domain translation settings. Human translators solve this problem with monolingual or bilingual dictionaries. In this paper, we propose appending definitions from a bilingual dictionary to source sentences and using attention masking to link together rare words with their definitions. We find that including definitions for rare words improves performance by up to 1.0 BLEU and 1.6 MacroF1.

computational linguistic, rare word, translation, (14 more...)

2408.09075

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(10 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)