AITopics

2502.10852

Country:

Europe (1.00)
North America (0.93)
Asia > China (0.48)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)

arXiv.org Artificial IntelligenceFeb-13-2025

Salamandra Technical Report

Gonzalez-Agirre, Aitor, Pàmies, Marc, Llop, Joan, Baucells, Irene, Da Dalt, Severino, Tamayo, Daniel, Saiz, José Javier, Espuña, Ferran, Prats, Jaume, Aula-Blasco, Javier, Mina, Mario, Pikabea, Iñigo, Rubio, Adrián, Shvets, Alexander, Sallés, Anna, Lacunza, Iñaki, Palomar, Jorge, Falcão, Júlia, Tormo, Lucía, Vasquez-Reina, Luis, Marimon, Montserrat, Pareras, Oriol, Ruiz-Fernández, Valle, Villegas, Marta

This work introduces Salamandra, a suite of open-source decoder-only large language models available in three different sizes: 2, 7, and 40 billion parameters. The models were trained from scratch on highly multilingual data that comprises text in 35 European languages and code. Our carefully curated corpus is made exclusively from open-access data compiled from a wide variety of sources. Along with the base models, supplementary checkpoints that were fine-tuned on public-domain instruction data are also released for chat applications. Additionally, we also share our preliminary experiments on multimodality, which serve as proof-of-concept to showcase potential applications for the Salamandra family. Our extensive evaluations on multilingual benchmarks reveal that Salamandra has strong capabilities, achieving competitive performance when compared to similarly sized open-source models. We provide comprehensive evaluation results both on standard downstream tasks as well as key aspects related to bias and safety.With this technical report, we intend to promote open science by sharing all the details behind our design choices, data curation strategy and evaluation methodology. In addition to that, we deviate from the usual practice by making our training and evaluation scripts publicly accessible. We release all models under a permissive Apache 2.0 license in order to foster future research and facilitate commercial use, thereby contributing to the open-source ecosystem of large language models.

kideak modu eraginkorragoan aurkitzen zituzten, large language model, machine learning, (19 more...)

2502.08489

Country:

Europe > Spain (1.00)
North America > United States (0.92)
Asia > Middle East > UAE (0.45)

Genre: Research Report > New Finding (0.92)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(4 more...)

Technology:

Information Technology > Software (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(3 more...)

arXiv.org Machine LearningFeb-13-2025

Theoretical Benefit and Limitation of Diffusion Language Model

Feng, Guhao, Geng, Yihan, Guan, Jian, Wu, Wei, Wang, Liwei, He, Di

Diffusion language models have emerged as a promising approach for text generation. One would naturally expect this method to be an efficient replacement for autoregressive models since multiple tokens can be sampled in parallel during each diffusion step. However, its efficiency-accuracy trade-off is not yet well understood. In this paper, we present a rigorous theoretical analysis of a widely used type of diffusion language model, the Masked Diffusion Model (MDM), and find that its effectiveness heavily depends on the target evaluation metric. Under mild conditions, we prove that when using perplexity as the metric, MDMs can achieve near-optimal perplexity in sampling steps regardless of sequence length, demonstrating that efficiency can be achieved without sacrificing performance. However, when using the sequence error rate--which is important for understanding the "correctness" of a sequence, such as a reasoning chain--we show that the required sampling steps must scale linearly with sequence length to obtain "correct" sequences, thereby eliminating MDM's efficiency advantage over autoregressive models. Our analysis establishes the first theoretical foundation for understanding the benefits and limitations of MDMs. All theoretical findings are supported by empirical studies.

large language model, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2502.09622

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceFeb-12-2025

Are Expressions for Music Emotions the Same Across Cultures?

Celen, Elif, van Rijn, Pol, Lee, Harin, Jacoby, Nori

Music evokes profound emotions, yet the universality of emotional descriptors across languages remains debated. A key challenge in cross-cultural research on music emotion is biased stimulus selection and manual curation of taxonomies, predominantly relying on Western music and languages. To address this, we propose a balanced experimental design with nine online experiments in Brazil, the US, and South Korea, involving N=672 participants. First, we sample a balanced set of popular music from these countries. Using an open-ended tagging pipeline, we then gather emotion terms to create culture-specific taxonomies. Finally, using these bottom-up taxonomies, participants rate emotions of each song. This allows us to map emotional similarities within and across cultures. Results show consistency in high arousal, high valence emotions but greater variability in others. Notably, machine translations were often inadequate to capture music-specific meanings. These findings together highlight the need for a domain-sensitive, open-ended, bottom-up emotion elicitation approach to reduce cultural biases in emotion research.

artificial intelligence, machine translation, natural language, (18 more...)

2502.08744

Country:

South America > Brazil (0.28)
Asia > South Korea (0.24)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
(7 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.34)

FOX NewsFeb-11-2025, 11:00:53 GMT

Top translation apps for travelers

Navi allows you to understand anyone, anywhere, anytime. Traveling abroad can be an incredible adventure, but the language barrier can make even the most intrepid explorer nervous. The fear of being unable to communicate or understand signs and menus keeps many would-be travelers from ever leaving their home country. With the rise of powerful translation apps, you can harness the latest technology right on your smartphone to bridge the language gap wherever your journey takes you. I've tested out the top contenders to bring you my picks for the best translation apps for global travelers.

artificial intelligence, natural language, translation app, (18 more...)

FOX News

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.61)

Shahbazi, Ashkan, Akbari, Elaheh, Salehi, Darian, Liu, Xinran, Naderializadeh, Navid, Kolouri, Soheil

ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans

While self-attention has been instrumental in the success of Transformers, it can lead to over-concentration on a few tokens during training, resulting in suboptimal information flow. Enforcing doubly-stochastic constraints in attention matrices has been shown to improve structure and balance in attention distributions. However, existing methods rely on iterative Sinkhorn normalization, which is computationally costly. In this paper, we introduce a novel, fully parallelizable doubly-stochastic attention mechanism based on sliced optimal transport, leveraging Expected Sliced Transport Plans (ESP). Unlike prior approaches, our method enforces double stochasticity without iterative Sinkhorn normalization, significantly enhancing efficiency. To ensure differentiability, we incorporate a temperature-based soft sorting technique, enabling seamless integration into deep learning models. Experiments across multiple benchmark datasets, including image classification, point cloud classification, sentiment analysis, and neural machine translation, demonstrate that our enhanced attention regularization consistently improves performance across diverse applications.

artificial intelligence, machine learning, natural language, (18 more...)

2502.07962

Country:

North America > United States > Tennessee > Davidson County > Nashville (0.04)
North America > United States > North Carolina > Durham County > Durham (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Unsupervised Translation of Emergent Communication

Levy, Ido, Paradise, Orr, Carmeli, Boaz, Meir, Ron, Goldwasser, Shafi, Belinkov, Yonatan

Emergent Communication (EC) provides a unique window into the language systems that emerge autonomously when agents are trained to jointly achieve shared goals. However, it is difficult to interpret EC and evaluate its relationship with natural languages (NL). This study employs unsupervised neural machine translation (UNMT) techniques to decipher ECs formed during referential games with varying task complexities, influenced by the semantic diversity of the environment. Our findings demonstrate UNMT's potential to translate EC, illustrating that task complexity characterized by semantic diversity enhances EC translatability, while higher task complexity with constrained semantic variability exhibits pragmatic EC, which, although challenging to interpret, remains suitable for translation. This research marks the first attempt, to our knowledge, to translate EC without the aid of parallel data.

artificial intelligence, machine learning, natural language, (17 more...)

2502.07552

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Michail, Andrianos, Raclé, Corina Julia, Opitz, Juri, Clematide, Simon

Adapting Multilingual Embedding Models to Historical Luxembourgish

The growing volume of digitized historical texts requires effective semantic search using text embeddings. However, pre-trained multilingual models, typically evaluated on contemporary texts, face challenges with historical digitized content due to OCR noise and outdated spellings. We explore the use of multilingual embeddings for cross-lingual semantic search on historical Luxembourgish, a low-resource language. We collect historical Luxembourgish news articles spanning various time periods and use GPT-4o to segment and translate them into closely related languages, creating 20,000 parallel training sentences per language pair. We further create a historical bitext mining evaluation set and find that these models struggle to perform cross-lingual search on historical Luxembourgish. To address this, we propose a simple adaptation method using in-domain training data, achieving up to 98\% accuracy in cross-lingual evaluations. We release our adapted models and historical Luxembourgish-German/French bitexts to support further research.

large language model, machine learning, natural language, (21 more...)

2502.07938

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Switzerland > Zürich > Zürich (0.05)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Khaniki, Mohammad Ali Labbaf, Saadati, Sahabeh, Manthouri, Mohammad

An Advanced NLP Framework for Automated Medical Diagnosis with DeBERTa and Dynamic Contextual Positional Gating

This paper presents a novel Natural Language Processing (NLP) framework for enhancing medical diagnosis through the integration of advanced techniques in data augmentation, feature extraction, and classification. The proposed approach employs back-translation to generate diverse paraphrased datasets, improving robustness and mitigating overfitting in classification tasks. Leveraging Decoding-enhanced BERT with Disentangled Attention (DeBERTa) with Dynamic Contextual Positional Gating (DCPG), the model captures fine-grained contextual and positional relationships, dynamically adjusting the influence of positional information based on semantic context to produce high-quality text embeddings. For classification, an Attention-Based Feedforward Neural Network (ABFNN) is utilized, effectively focusing on the most relevant features to improve decision-making accuracy. Applied to the classification of symptoms, clinical notes, and other medical texts, this architecture demonstrates its ability to address the complexities of medical data. The combination of data augmentation, contextual embedding generation, and advanced classification mechanisms offers a robust and accurate diagnostic tool, with potential applications in automated medical diagnosis and clinical decision support. This method demonstrates the effectiveness of the proposed NLP framework for medical diagnosis, achieving remarkable results with an accuracy of 99.78%, recall of 99.72%, precision of 99.79%, and an F1-score of 99.75%. These metrics not only underscore the model's robust performance in classifying medical texts with exceptional precision and reliability but also highlight its superiority over existing methods, making it a highly promising tool for automated diagnostic systems.

large language model, machine learning, natural language, (20 more...)

2502.07755

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Iran > Tehran Province > Tehran (0.05)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Health Care Technology > Telehealth (0.94)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Vashurin, Roman, Goloburda, Maiya, Nakov, Preslav, Shelmanov, Artem, Panov, Maxim

CoCoA: A Generalized Approach to Uncertainty Quantification by Integrating Confidence and Consistency of LLM Outputs

Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompasses a variety of approaches, with two major types being particularly prominent: information-based, which focus on model confidence expressed as token probabilities, and consistency-based, which assess the semantic relationship between multiple outputs generated using repeated sampling. Several recent methods have combined these two approaches and shown impressive performance in various applications. However, they sometimes fail to outperform much simpler baseline methods. Our investigation reveals distinctive characteristics of LLMs as probabilistic models, which help to explain why these UQ methods underperform in certain tasks. Based on these findings, we propose a new way of synthesizing model confidence and output consistency that leads to a family of efficient and robust UQ methods. We evaluate our approach across a variety of tasks such as question answering, abstractive summarization, and machine translation, demonstrating sizable improvements over state-of-the-art UQ approaches.

confidence and consistency, large language model, natural language, (16 more...)

2502.04964

Country:

Europe > Austria (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
North America > United States > Maryland (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.88)