AITopics | Ghaddar, Abbas

Collaborating Authors

Ghaddar, Abbas

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search

Mo, Fengran, Ghaddar, Abbas, Mao, Kelong, Rezagholizadeh, Mehdi, Chen, Boxing, Liu, Qun, Nie, Jian-Yun

arXiv.org Artificial IntelligenceJun-7-2024

In this paper, we study how open-source large language models (LLMs) can be effectively deployed for improving query rewriting in conversational search, especially for ambiguous queries. We introduce CHIQ, a two-step method that leverages the capabilities of LLMs to resolve ambiguities in the conversation history before query rewriting. This approach contrasts with prior studies that predominantly use closed-source LLMs to directly generate search queries from conversation history. We demonstrate on five well-established benchmarks that CHIQ leads to state-of-the-art results across most settings, showing highly competitive performances with systems leveraging closed-source LLMs. Our study provides a first step towards leveraging open-source LLMs in conversational search, as a competitive alternative to the prevailing reliance on commercial LLMs. Data, models, and source code will be publicly available upon acceptance at https://github.com/fengranMark/CHIQ.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.05013

Country:

North America > United States > California (0.14)
North America > Canada > Quebec (0.14)

Genre: Research Report > Experimental Study (0.66)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.93)
Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

OTTAWA: Optimal TransporT Adaptive Word Aligner for Hallucination and Omission Translation Errors Detection

Huang, Chenyang, Ghaddar, Abbas, Kobyzev, Ivan, Rezagholizadeh, Mehdi, Zaiane, Osmar R., Chen, Boxing

arXiv.org Artificial IntelligenceJun-3-2024

Recently, there has been considerable attention on detecting hallucinations and omissions in Machine Translation (MT) systems. The two dominant approaches to tackle this task involve analyzing the MT system's internal states or relying on the output of external tools, such as sentence similarity or MT quality estimators. In this work, we introduce OTTAWA, a novel Optimal Transport (OT)-based word aligner specifically designed to enhance the detection of hallucinations and omissions in MT systems. Our approach explicitly models the missing alignments by introducing a "null" vector, for which we propose a novel one-side constrained OT setting to allow an adaptive null alignment. Our approach yields competitive results compared to state-of-the-art methods across 18 language pairs on the HalOmi benchmark. In addition, it shows promising features, such as the ability to distinguish between both error types and perform word-level detection without accessing the MT system's internal states.

artificial intelligence, computational linguistic, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.01919

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

CHARP: Conversation History AwaReness Probing for Knowledge-grounded Dialogue Systems

Ghaddar, Abbas, Alfonso-Hermelo, David, Langlais, Philippe, Rezagholizadeh, Mehdi, Chen, Boxing, Parthasarathi, Prasanna

arXiv.org Artificial IntelligenceMay-23-2024

In this work, we dive deep into one of the popular knowledge-grounded dialogue benchmarks that focus on faithfulness, FaithDial. We show that a significant portion of the FaithDial data contains annotation artifacts, which may bias models towards completely ignoring the conversation history. We therefore introduce CHARP, a diagnostic test set, designed for an improved evaluation of hallucinations in conversational model. CHARP not only measures hallucination but also the compliance of the models to the conversation task. Our extensive analysis reveals that models primarily exhibit poor performance on CHARP due to their inability to effectively attend to and reason over the conversation history. Furthermore, the evaluation methods of FaithDial fail to capture these shortcomings, neglecting the conversational history. Our findings indicate that there is substantial room for contribution in both dataset creation and hallucination evaluation for knowledge-grounded dialogue, and that CHARP can serve as a tool for monitoring the progress in this particular research area. CHARP is publicly available at https://huggingface.co/datasets/huawei-noah/CHARP

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2405.1511

Country:

Europe (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On the importance of Data Scale in Pretraining Arabic Language Models

Ghaddar, Abbas, Langlais, Philippe, Rezagholizadeh, Mehdi, Chen, Boxing

arXiv.org Artificial IntelligenceJan-15-2024

Pretraining monolingual language models have been proven to be vital for performance in Arabic Natural Language Processing (NLP) tasks. In this paper, we conduct a comprehensive study on the role of data in Arabic Pretrained Language Models (PLMs). More precisely, we reassess the performance of a suite of state-of-the-art Arabic PLMs by retraining them on massive-scale, high-quality Arabic corpora. We have significantly improved the performance of the leading Arabic encoder-only BERT-base and encoder-decoder T5-base models on the ALUE and ORCA leaderboards, thereby reporting state-of-the-art results in their respective model categories. In addition, our analysis strongly suggests that pretraining data by far is the primary contributor to performance, surpassing other factors. Our models and source code are publicly available at https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/JABER-PyTorch.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2401.0776

Country:

North America > Canada (0.29)
Europe > Italy (0.28)
North America > United States > Louisiana (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing

Alghamdi, Asaad, Duan, Xinyu, Jiang, Wei, Wang, Zhenhai, Wu, Yimeng, Xia, Qingrong, Wang, Zhefeng, Zheng, Yi, Rezagholizadeh, Mehdi, Huai, Baoxing, Cheng, Peilun, Ghaddar, Abbas

arXiv.org Artificial IntelligenceJun-11-2023

Developing monolingual large Pre-trained Language Models (PLMs) is shown to be very successful in handling different tasks in Natural Language Processing (NLP). In this work, we present AraMUS, the largest Arabic PLM with 11B parameters trained on 529GB of high-quality Arabic textual data. AraMUS achieves state-of-the-art performances on a diverse set of Arabic classification and generative tasks. Moreover, AraMUS shows impressive few-shot learning abilities compared with the best existing Arabic PLMs.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2306.068

Country:

Europe (0.46)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback