AITopics | Nuti, Gaurav

Collaborating Authors

Nuti, Gaurav

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Arctic-Embed 2.0: Multilingual Retrieval Without Compromise

Yu, Puxuan, Merrick, Luke, Nuti, Gaurav, Campos, Daniel

arXiv.org Artificial IntelligenceDec-13-2024

This paper presents the training methodology of Arctic-Embed 2.0, a set of open-source text embedding models built for accurate and efficient multilingual retrieval. While prior works have suffered from degraded English retrieval quality, Arctic-Embed 2.0 delivers competitive retrieval quality on multilingual and English-only benchmarks, and supports Matryoshka Representation Learning (MRL) for efficient embedding storage with significantly lower compressed quality degradation compared to alternatives. We detail the design and implementation, presenting several important open research questions that arose during model development. We conduct experiments exploring these research questions and include extensive discussion aimed at fostering further discussion in this field.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.04506

Country:

North America > United States (0.29)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models

Merrick, Luke, Xu, Danmei, Nuti, Gaurav, Campos, Daniel

arXiv.org Artificial IntelligenceMay-8-2024

This report describes the training dataset creation and recipe behind the family of \texttt{arctic-embed} text embedding models (a set of five models ranging from 22 to 334 million parameters with weights open-sourced under an Apache-2 license). At the time of their release, each model achieved state-of-the-art retrieval accuracy for models of their size on the MTEB Retrieval leaderboard, with the largest model, arctic-embed-l outperforming closed source embedding models such as Cohere's embed-v3 and Open AI's text-embed-3-large. In addition to the details of our training recipe, we have provided several informative ablation studies, which we believe are the cause of our model performance.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2405.05374

Country: Asia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback