AITopics | deva

Collaborating Authors

deva

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HYPEROFA: Expanding LLM Vocabulary to New Languages via Hypernetwork-Based Embedding Initialization

Özeren, Enes, Liu, Yihong, Schütze, Hinrich

arXiv.org Artificial IntelligenceJul-15-2025

Many pre-trained language models (PLMs) exhibit suboptimal performance on mid- and low-resource languages, largely due to limited exposure to these languages during pre-training. A common strategy to address this is to introduce new tokens specific to the target languages, initialize their embeddings, and apply continual pre-training on target-language data. Among such methods, OFA (Liu et al., 2024a) proposes a similarity-based subword embedding initialization heuristic that is both effective and efficient. However, OFA restricts target-language token embeddings to be convex combinations of a fixed number of source-language embeddings, which may limit expressiveness. To overcome this limitation, we propose HYPEROFA, a hypernetwork-based approach for more adaptive token embedding initialization. The hypernetwork is trained to map from an external multilingual word vector space to the PLMs token embedding space using source-language tokens. Once trained, it can generate flexible embeddings for target-language tokens, serving as a good starting point for continual pretraining. Experiments demonstrate that HYPEROFA consistently outperforms random initialization baseline and matches or exceeds the performance of OFA in both continual pre-training convergence and downstream task performance. We make the code publicly available.

large language model, latn, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2504.21018

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
(20 more...)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.82)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)

Add feedback

DialUp! Modeling the Language Continuum by Adapting Models to Dialects and Dialects to Models

Bafna, Niyati, Chang, Emily, Robinson, Nathaniel R., Mortensen, David R., Murray, Kenton, Yarowsky, David, Sirin, Hale

arXiv.org Artificial IntelligenceJan-27-2025

Most of the world's languages and dialects are low-resource, and lack support in mainstream machine translation (MT) models. However, many of them have a closely-related high-resource language (HRL) neighbor, and differ in linguistically regular ways from it. This underscores the importance of model robustness to dialectical variation and cross-lingual generalization to the HRL dialect continuum. We present DialUp, consisting of a training-time technique for adapting a pretrained model to dialectical data (M->D), and an inference-time intervention adapting dialectical data to the model expertise (D->M). M->D induces model robustness to potentially unseen and unknown dialects by exposure to synthetic data exemplifying linguistic mechanisms of dialectical variation, whereas D->M treats dialectical divergence for known target dialects. These methods show considerable performance gains for several dialects from four language families, and modest gains for two other language families. We also conduct feature and error analyses, which show that language varieties with low baseline MT performance are more likely to benefit from these approaches.

large language model, latn, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2501.16581

Country:

Europe > Sweden (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Philippines > Luzon > Ilocos Region > Province of Pangasinan (0.05)
(21 more...)

Genre:

Personal > Honors (0.46)
Research Report > New Finding (0.45)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Shiksha: A Technical Domain focused Translation Dataset and Model for Indian Languages

Joglekar, Advait, Umesh, Srinivasan

arXiv.org Artificial IntelligenceDec-12-2024

Neural Machine Translation (NMT) models are typically trained on datasets with limited exposure to Scientific, Technical and Educational domains. Translation models thus, in general, struggle with tasks that involve scientific understanding or technical jargon. Their performance is found to be even worse for low-resource Indian languages. Finding a translation dataset that tends to these domains in particular, poses a difficult challenge. In this paper, we address this by creating a multilingual parallel corpus containing more than 2.8 million rows of English-to-Indic and Indic-to-Indic high-quality translation pairs across 8 Indian languages. We achieve this by bitext mining human-translated transcriptions of NPTEL video lectures. We also finetune and evaluate NMT models using this corpus and surpass all other publicly available models at in-domain tasks. We also demonstrate the potential for generalizing to out-of-domain translation tasks by improving the baseline by over 2 BLEU on average for these Indian languages on the Flores+ benchmark. We are pleased to release our model and dataset via this link: https://huggingface.co/SPRINGLab.

artificial intelligence, machine translation, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.09025

Country:

Asia > India (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

NLIP_Lab-IITH Multilingual MT System for WAT24 MT Shared Task

Brahma, Maharaj, Sahoo, Pramit, Desarkar, Maunendra Sankar

arXiv.org Artificial IntelligenceOct-17-2024

This paper describes NLIP Lab's multilingual machine translation system for the WAT24 shared task on multilingual Indic MT task for 22 scheduled languages belonging to 4 language families. We explore pre-training for Indic languages using alignment agreement objectives. We utilize bi-lingual dictionaries to substitute words from source sentences. Furthermore, we fine-tuned language direction-specific multilingual translation models using small and high-quality seed data. Our primary submission is a 243M parameters multilingual translation model covering 22 Indic languages. In the IN22-Gen benchmark, we achieved an average chrF++ score of 46.80 and 18.19 BLEU score for the En-Indic direction. In the Indic-En direction, we achieved an average chrF++ score of 56.34 and 30.82 BLEU score. In the In22-Conv benchmark, we achieved an average chrF++ score of 43.43 and BLEU score of 16.58 in the En-Indic direction, and in the Indic-En direction, we achieved an average of 52.44 and 29.77 for chrF++ and BLEU respectively. Our model\footnote{Our code and models are available at \url{https://github.com/maharajbrahma/WAT2024-MultiIndicMT}} is competitive with IndicTransv1 (474M parameter model).

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2410.13443

Country:

North America > United States > Pennsylvania (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Asia > India > Telangana > Hyderabad (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Navigating Text-to-Image Generative Bias across Indic Languages

Mittal, Surbhi, Sudan, Arnav, Vatsa, Mayank, Singh, Richa, Glaser, Tamar, Hassner, Tal

arXiv.org Artificial IntelligenceAug-1-2024

This research investigates biases in text-to-image (TTI) models for the Indic languages widely spoken across India. It evaluates and compares the generative performance and cultural relevance of leading TTI models in these languages against their performance in English. Using the proposed IndicTTI benchmark, we comprehensively assess the performance of 30 Indic languages with two open-source diffusion models and two commercial generation APIs. The primary objective of this benchmark is to evaluate the support for Indic languages in these models and identify areas needing improvement. Given the linguistic diversity of 30 languages spoken by over 1.4 billion people, this benchmark aims to provide a detailed and insightful analysis of TTI models' effectiveness within the Indic linguistic landscape.

benchmark, deva, indic language, (14 more...)

arXiv.org Artificial Intelligence

2408.00283

Country:

North America > United States > California > San Mateo County > Menlo Park (0.04)
North America > United States > California > Alameda County > Alameda (0.04)
Asia > Indonesia > Bali (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

Video: Learning Machine Learning with .NET, PyTorch and the ONNX Runtime – Le Café Central de DeVa

#artificialintelligenceFeb-8-2019, 10:19:53 GMT

ONNX is a open format to represent deep learning models that is supported by various frameworks and tools. This format makes it easier to interoperate between frameworks and to maximize the reach of your hardware optimization investments. In this episode, Seth Juarez sits with Rich to show us how we can use the ONNX runtime inside of our .NET applications. He gives us a quick introduction to training a model with PyTorch, and also explains some foundational concepts around prediction accuracy.

artificial intelligence, deep learning, learning machine learning, (2 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback