AITopics | Bornea, Mihaela

Collaborating Authors

Bornea, Mihaela

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Granite Embedding Models

Awasthy, Parul, Trivedi, Aashka, Li, Yulong, Bornea, Mihaela, Cox, David, Daniels, Abraham, Franz, Martin, Goodhart, Gabe, Iyer, Bhavani, Kumar, Vishwajeet, Lastras, Luis, McCarley, Scott, Murthy, Rudra, P, Vignesh, Rosenthal, Sara, Roukos, Salim, Sen, Jaydeep, Sharma, Sukriti, Sil, Avirup, Soule, Kate, Sultan, Arafat, Florian, Radu

arXiv.org Artificial IntelligenceFeb-27-2025

We introduce the Granite Embedding models, a family of encoder-based embedding models designed for retrieval tasks, spanning dense-retrieval and sparse-retrieval architectures, with both English and Multilingual capabilities. This report provides the technical details of training these highly effective 12 layer embedding models, along with their efficient 6 layer distilled counterparts. Extensive evaluations show that the models, developed with techniques like retrieval oriented pretraining, contrastive finetuning, knowledge distillation, and model merging significantly outperform publicly available models of similar sizes on both internal IBM retrieval and search tasks, and have equivalent performance on widely-used information retrieval benchmarks, while being trained on high-quality data suitable for enterprise use. We publicly release all our Granite Embedding models under the Apache 2.0 license, allowing both research and commercial use at https://huggingface.co/collections/ibm-granite . Figure 1: Average performance on the Granite embedding models (in blue) vs BGE, GTE, Snowflake, E5, and Nomic models on 5 QA and IR datasets: BEIR, ClapNQ, CoIR, RedHat, and UnifiedSearch (the last 2 are internal IBM datasets). The goal of text embedding models is to convert variable length text into a fixed vector, encoding the text semantics into a multidimensional vector in such a way that semantically close texts are close in the vector space, while dissimilar texts have a low similarity. These embeddings can then be used in a variety of tasks, most commonly in retrieval applications, where the relevance of a document to a given query can be determined by the similarity of their embeddings (Dunn et al., 2017; Xiong et al., 2020; Neelakantan et al., 2022)(Zamani et al., 2018; Zhao et al., 2020), but also in document clustering (Angelov, 2020) and text classification (Sun et al., 2019). See Contributions section for full author list.

information retrieval, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.20204

Country:

North America > United States > Louisiana (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.66)

Industry: Information Technology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Information Management > Search (0.93)
(2 more...)

Add feedback

PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development

Sil, Avirup, Sen, Jaydeep, Iyer, Bhavani, Franz, Martin, Fadnis, Kshitij, Bornea, Mihaela, Rosenthal, Sara, McCarley, Scott, Zhang, Rong, Kumar, Vishwajeet, Li, Yulong, Sultan, Md Arafat, Bhat, Riyaz, Florian, Radu, Roukos, Salim

arXiv.org Artificial IntelligenceJan-25-2023

The field of Question Answering (QA) has made remarkable progress in recent years, thanks to the advent of large pre-trained language models, newer realistic benchmark datasets with leaderboards, and novel algorithms for key components such as retrievers and readers. In this paper, we introduce PRIMEQA: a one-stop and open-source QA repository with an aim to democratize QA re-search and facilitate easy replication of state-of-the-art (SOTA) QA methods. PRIMEQA supports core QA functionalities like retrieval and reading comprehension as well as auxiliary capabilities such as question generation.It has been designed as an end-to-end toolkit for various use cases: building front-end applications, replicating SOTA methods on pub-lic benchmarks, and expanding pre-existing methods. PRIMEQA is available at : https://github.com/primeqa.

computational linguistic, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2301.09715

Country:

Europe (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Add feedback

Learning to Transpile AMR into SPARQL

Bornea, Mihaela, Astudillo, Ramon Fernandez, Naseem, Tahira, Mihindukulasooriya, Nandana, Abdelaziz, Ibrahim, Kapanipathi, Pavan, Florian, Radu, Roukos, Salim

arXiv.org Artificial IntelligenceDec-8-2022

We propose a transition-based system to transpile Abstract Meaning Representation (AMR) into SPARQL for Knowledge Base Question Answering (KBQA). This allows us to delegate part of the semantic representation to a strongly pre-trained semantic parser, while learning transpiling with small amount of paired data. We depart from recent work relating AMR and SPARQL constructs, but rather than applying a set of rules, we teach a BART model to selectively use these relations. Further, we avoid explicitly encoding AMR but rather encode the parser state in the attention mechanism of BART, following recent semantic parsing works. The resulting model is simple, provides supporting text for its decisions, and outperforms recent approaches in KBQA across two knowledge bases: DBPedia (LC-QuAD 1.0, QALD-9) and Wikidata (WebQSP, SWQ-WD).

artificial intelligence, natural language, relation, (19 more...)

arXiv.org Artificial Intelligence

2112.07877

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.75)

Add feedback

Generative Relation Linking for Question Answering over Knowledge Bases

Rossiello, Gaetano, Mihindukulasooriya, Nandana, Abdelaziz, Ibrahim, Bornea, Mihaela, Gliozzo, Alfio, Naseem, Tahira, Kapanipathi, Pavan

arXiv.org Artificial IntelligenceAug-16-2021

The goal of Knowledge Base Question Answering (KBQA) systems is to transform natural language questions into SPARQL queries that are then used to retrieve answer(s) from the target Knowledge Base (KB). Relation linking is a crucial component in building KBQA systems. It identifies the relations expressed in the question and maps them to the corresponding KB relations. For example, in Figure 1, to translate the question "What is the owning organization of the Ford Kansas City Assembly Plant and also the builder of the Ford Y-block engine?" into its corresponding SPARQL query, it is necessary to determine the two KB relations: dbo:owningOrganisation, dbo:manufacturer. Relation linking has proven to be a challenging problem, with state-of-the-art approaches performing less than 50% F1 on the majority of the datasets Sakor et al. [2019], Lin et al. [2020], Mihindukulasooriya et al. [2020], thus making it a bottleneck for the overall performance of KBQA systems. The challenges primarily arise from the following factors: 1) relations in text and the KB are often lexicalized differently (implicit mentions); 2) questions with multiple relations and 3) training data is often limited. While past approaches have tried to tackle these issues by either creating hand-coded rules Sakor et al. [2020], or by using semantic parsing Mihindukulasooriya et al. [2020], these challenges can be naturally addressed using the latest advances in auto-regressive sequence-to-sequence models (seq2seq) which have been shown to perform surprisingly well on tasks such as question answering Lewis et al. [2020a], slot filling Petroni et al. [2020] or entity linking Cao et al. [2020], in a generative fashion. However, seq2seq models have not yet been explored for relation linking, particularly in the context of KBQA. In this work, we introduce GenRL, a novel generative approach for relation linking that capitalises on pre-trained seq2seq models.

artificial intelligence, expert system, relation, (18 more...)

arXiv.org Artificial Intelligence

2108.07337

Country:

North America > United States > Missouri > Jackson County > Kansas City (0.24)
Europe > Austria > Vienna (0.14)

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.94)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.69)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.69)

Add feedback