AITopics | Rosenthal, Sara

Collaborating Authors

Rosenthal, Sara

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Granite Embedding Models

Awasthy, Parul, Trivedi, Aashka, Li, Yulong, Bornea, Mihaela, Cox, David, Daniels, Abraham, Franz, Martin, Goodhart, Gabe, Iyer, Bhavani, Kumar, Vishwajeet, Lastras, Luis, McCarley, Scott, Murthy, Rudra, P, Vignesh, Rosenthal, Sara, Roukos, Salim, Sen, Jaydeep, Sharma, Sukriti, Sil, Avirup, Soule, Kate, Sultan, Arafat, Florian, Radu

arXiv.org Artificial IntelligenceFeb-27-2025

We introduce the Granite Embedding models, a family of encoder-based embedding models designed for retrieval tasks, spanning dense-retrieval and sparse-retrieval architectures, with both English and Multilingual capabilities. This report provides the technical details of training these highly effective 12 layer embedding models, along with their efficient 6 layer distilled counterparts. Extensive evaluations show that the models, developed with techniques like retrieval oriented pretraining, contrastive finetuning, knowledge distillation, and model merging significantly outperform publicly available models of similar sizes on both internal IBM retrieval and search tasks, and have equivalent performance on widely-used information retrieval benchmarks, while being trained on high-quality data suitable for enterprise use. We publicly release all our Granite Embedding models under the Apache 2.0 license, allowing both research and commercial use at https://huggingface.co/collections/ibm-granite . Figure 1: Average performance on the Granite embedding models (in blue) vs BGE, GTE, Snowflake, E5, and Nomic models on 5 QA and IR datasets: BEIR, ClapNQ, CoIR, RedHat, and UnifiedSearch (the last 2 are internal IBM datasets). The goal of text embedding models is to convert variable length text into a fixed vector, encoding the text semantics into a multidimensional vector in such a way that semantically close texts are close in the vector space, while dissimilar texts have a low similarity. These embeddings can then be used in a variety of tasks, most commonly in retrieval applications, where the relevance of a document to a given query can be determined by the similarity of their embeddings (Dunn et al., 2017; Xiong et al., 2020; Neelakantan et al., 2022)(Zamani et al., 2018; Zhao et al., 2020), but also in document clustering (Angelov, 2020) and text classification (Sun et al., 2019). See Contributions section for full author list.

information retrieval, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.20204

Country:

North America > United States > Louisiana (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.66)

Industry: Information Technology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Information Management > Search (0.93)
(2 more...)

Add feedback

MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems

Katsis, Yannis, Rosenthal, Sara, Fadnis, Kshitij, Gunasekara, Chulaka, Lee, Young-Suk, Popa, Lucian, Shah, Vraj, Zhu, Huaiyu, Contractor, Danish, Danilevsky, Marina

arXiv.org Artificial IntelligenceJan-6-2025

Retrieval-augmented generation (RAG) has recently become a very popular task for Large Language Models (LLMs). Evaluating them on multi-turn RAG conversations, where the system is asked to generate a response to a question in the context of a preceding conversation is an important and often overlooked task with several additional challenges. We present MTRAG: an end-to-end human-generated multi-turn RAG benchmark that reflects several real-world properties across diverse dimensions for evaluating the full RAG pipeline. MTRAG contains 110 conversations averaging 7.7 turns each across four domains for a total of 842 tasks. We also explore automation paths via synthetic data and LLM-as-a-Judge evaluation. Our human and automatic evaluations show that even state-of-the-art LLM RAG systems struggle on MTRAG. We demonstrate the need for strong retrieval and generation systems that can handle later turns, unanswerable questions, non-standalone questions, and multiple domains. MTRAG is available at https://github.com/ibm/mt-rag-benchmark.

benchmark, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2501.03468

Country:

Europe (0.67)
North America > United States > Texas (0.14)
North America > United States > New York (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Government (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems

Rosenthal, Sara, Sil, Avirup, Florian, Radu, Roukos, Salim

arXiv.org Artificial IntelligenceApr-2-2024

Large (NQ) (Kwiatkowski et al., 2019) and SQuAD (Rajpurkar scale research in this area began with the tasks et al., 2016, 2018) which are just a few of Machine Reading Comprehension (Rajpurkar words. It is grounded on a single gold passage, et al., 2016; Rogers et al., 2023; Fisch et al., in contrast to other long-form question answering 2021), and Information Retrieval (Manning et al., (LFQA) datasets such as ELI5 (Fan et al., 2019) 2008; Voorhees and Harman, 2005; Thakur et al., where gold passages are not available. It is built 2021) and has more recently been come to be from a subset of the highly successful Natural Questions known as Retrieval Augmented Generation (Lewis (Kwiatkowski et al., 2019) dataset for extractive et al., 2021; Guu et al., 2020) which encompasses QA from Wikipedia documents based on users both tasks. The recent popularity of generative real web search queries - specifically, the subset of AI with Large Language models (LLM), such as NQ that has long answers (passages) but no short GPT (Brown et al., 2020), Llama (Touvron et al., extractive answers.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2404.02103

Country:

North America > United States (1.00)
Asia (1.00)
Oceania > New Zealand > North Island (0.28)
Europe > United Kingdom > England (0.28)

Genre:

Overview (0.67)
Research Report (0.64)

Industry:

Leisure & Entertainment (1.00)
Media (0.67)
Energy > Power Industry (0.47)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Muted: Multilingual Targeted Offensive Speech Identification and Visualization

Tillmann, Christoph, Trivedi, Aashka, Rosenthal, Sara, Borse, Santosh, Zhang, Rong, Sil, Avirup, Bhattacharjee, Bishwaranjan

arXiv.org Artificial IntelligenceDec-18-2023

Offensive language such as hate, abuse, and profanity (HAP) occurs in various content on the web. While previous work has mostly dealt with sentence level annotations, there have been a few recent attempts to identify offensive spans as well. We build upon this work and introduce Muted, a system to identify multilingual HAP content by displaying offensive arguments and their targets using heat maps to indicate their intensity. Muted can leverage any transformer-based HAP-classification model and its attention mechanism out-of-the-box to identify toxic spans, without further fine-tuning. In addition, we use the spaCy library to identify the specific targets and arguments for the words predicted by the attention heatmaps. We present the model's performance on identifying offensive spans and their targets in existing datasets and present new annotations on German text. Finally, we demonstrate our proposed visualization tool on multilingual inputs.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.11344

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development

Sil, Avirup, Sen, Jaydeep, Iyer, Bhavani, Franz, Martin, Fadnis, Kshitij, Bornea, Mihaela, Rosenthal, Sara, McCarley, Scott, Zhang, Rong, Kumar, Vishwajeet, Li, Yulong, Sultan, Md Arafat, Bhat, Riyaz, Florian, Radu, Roukos, Salim

arXiv.org Artificial IntelligenceJan-25-2023

The field of Question Answering (QA) has made remarkable progress in recent years, thanks to the advent of large pre-trained language models, newer realistic benchmark datasets with leaderboards, and novel algorithms for key components such as retrievers and readers. In this paper, we introduce PRIMEQA: a one-stop and open-source QA repository with an aim to democratize QA re-search and facilitate easy replication of state-of-the-art (SOTA) QA methods. PRIMEQA supports core QA functionalities like retrieval and reading comprehension as well as auxiliary capabilities such as question generation.It has been designed as an end-to-end toolkit for various use cases: building front-end applications, replicating SOTA methods on pub-lic benchmarks, and expanding pre-existing methods. PRIMEQA is available at : https://github.com/primeqa.

computational linguistic, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2301.09715

Country:

Europe (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Add feedback