AITopics | Paul, Debjit

Plotting

Paul, Debjit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Creativity in AI: Progresses and Challenges

Ismayilzada, Mete, Paul, Debjit, Bosselut, Antoine, van der Plas, Lonneke

arXiv.org Artificial IntelligenceDec-9-2024

Creativity is the ability to produce novel, useful, and surprising ideas, and has been widely studied as a crucial aspect of human cognition. Machine creativity on the other hand has been a long-standing challenge. With the rise of advanced generative AI, there has been renewed interest and debate regarding AI's creative capabilities. Therefore, it is imperative to revisit the state of creativity in AI and identify key progresses and remaining challenges. In this work, we survey leading works studying the creative capabilities of AI systems, focusing on creative problem-solving, linguistic, artistic, and scientific creativity. Our review suggests that while the latest AI models are largely capable of producing linguistically and artistically creative outputs such as poems, images, and musical pieces, they struggle with tasks that require creative problem-solving, abstract thinking and compositionality and their generations suffer from a lack of diversity, originality, long-range incoherence and hallucinations. We also discuss key questions concerning copyright and authorship issues with generative models. Furthermore, we highlight the need for a comprehensive evaluation of creativity that is process-driven and considers several dimensions of creativity. Finally, we propose future research directions to improve the creativity of AI outputs, drawing inspiration from cognitive science and psychology.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2410.17218

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(3 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.92)
Research Report > New Finding (0.67)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Law > Intellectual Property & Technology Law (1.00)
(3 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(6 more...)

Add feedback

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

Romanou, Angelika, Foroutan, Negar, Sotnikova, Anna, Chen, Zeming, Nelaturu, Sree Harsha, Singh, Shivalika, Maheshwary, Rishabh, Altomare, Micol, Haggag, Mohamed A., A, Snegha, Amayuelas, Alfonso, Amirudin, Azril Hafizi, Aryabumi, Viraat, Boiko, Danylo, Chang, Michael, Chim, Jenny, Cohen, Gal, Dalmia, Aditya Kumar, Diress, Abraham, Duwal, Sharad, Dzenhaliou, Daniil, Florez, Daniel Fernando Erazo, Farestam, Fabian, Imperial, Joseph Marvin, Islam, Shayekh Bin, Isotalo, Perttu, Jabbarishiviari, Maral, Karlsson, Börje F., Khalilov, Eldar, Klamm, Christopher, Koto, Fajri, Krzemiński, Dominik, de Melo, Gabriel Adriano, Montariol, Syrielle, Nan, Yiyang, Niklaus, Joel, Novikova, Jekaterina, Ceron, Johan Samir Obando, Paul, Debjit, Ploeger, Esther, Purbey, Jebish, Rajwal, Swati, Ravi, Selvan Sunitha, Rydell, Sara, Santhosh, Roshan, Sharma, Drishti, Skenduli, Marjana Prifti, Moakhar, Arshia Soltani, Moakhar, Bardia Soltani, Tamir, Ran, Tarun, Ayush Kumar, Wasi, Azmine Toushik, Weerasinghe, Thenuka Ovin, Yilmaz, Serhan, Zhang, Mike, Schlag, Imanol, Fadaee, Marzieh, Hooker, Sara, Bosselut, Antoine

arXiv.org Artificial IntelligenceNov-29-2024

The performance differential of large language models (LLM) between languages hinders their effective deployment in many regions, inhibiting the potential economic and societal value of generative AI tools in many communities. However, the development of functional LLMs in many languages (i.e., multilingual LLMs) is bottlenecked by the lack of high-quality evaluation resources in languages other than English. Moreover, current practices in multilingual benchmark construction often translate English resources, ignoring the regional and cultural knowledge of the environments in which multilingual systems would be used. In this work, we construct an evaluation suite of 197,243 QA pairs from local exam sources to measure the capabilities of multilingual LLMs in a variety of regional contexts. The rapid advancement of AI technologies underscores the importance of developing LLMs that are proficient across diverse linguistic and cultural contexts, ensuring fair and equitable performance for stakeholders from various language groups. However, the lack of high-quality evaluation benchmarks in many languages discourages practitioners from training multilingual LLMs to meet this challenge. This evaluation gap limits the effective deployment of LLMs for many regions, exacerbates digital divides, and inhibits the economic and societal value of AI tools in many underserved communities. The source of this gap is the multitude of challenges in evaluating LLMs for multilingual contexts. First, at a meta-level, the majority of benchmarks for LLMs are only in English (Hendrycks et al., 2020, inter alia). Technical challenges also abound due to the manner in which multilingual datasets are often collected. Certain datasets are constructed using manually applied templates, resulting in low prompt and completion diversity (Muennighoff et al., 2022). Many more are composed of translations from high-resource languages (e.g., English; Holtermann et al., 2024; Myung et al., 2024; Lai et al., 2023; Foroutan et al., 2023). These datasets often contain errors (Ponti et al., 2020; Plaza et al., 2024) and create translationese artifacts (Vanmassenhove et al., 2021; Hartung et al., 2023; Savoldi et al., 2021; Ji et al., 2023).

large language model, machine learning, nclude, (18 more...)

arXiv.org Artificial Intelligence

2411.19799

Country:

North America > United States (0.46)
Europe > Middle East > Malta (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Government (1.00)
Education > Curriculum > Subject-Specific Education (0.92)
Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Entity Insertion in Multilingual Linked Corpora: The Case of Wikipedia

Feith, Tomás, Arora, Akhil, Gerlach, Martin, Paul, Debjit, West, Robert

arXiv.org Artificial IntelligenceOct-5-2024

Links are a fundamental part of information networks, turning isolated pieces of knowledge into a network of information that is much richer than the sum of its parts. However, adding a new link to the network is not trivial: it requires not only the identification of a suitable pair of source and target entities but also the understanding of the content of the source to locate a suitable position for the link in the text. The latter problem has not been addressed effectively, particularly in the absence of text spans in the source that could serve as anchors to insert a link to the target entity. To bridge this gap, we introduce and operationalize the task of entity insertion in information networks. Focusing on the case of Wikipedia, we empirically show that this problem is, both, relevant and challenging for editors. We compile a benchmark dataset in 105 languages and develop a framework for entity insertion called LocEI (Localized Entity Insertion) and its multilingual variant XLocEI. We show that XLocEI outperforms all baseline models (including state-of-the-art prompt-based ranking with LLMs such as GPT-4) and that it can be applied in a zero-shot manner on languages not seen during training with minimal performance drop. These findings are important for applying entity insertion models in practice, e.g., to support editors in adding links across the more than 300 language versions of Wikipedia.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.04254

Country: North America > United States (0.67)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning

Paul, Debjit, West, Robert, Bosselut, Antoine, Faltings, Boi

arXiv.org Artificial IntelligenceFeb-23-2024

Large language models (LLMs) have been shown to perform better when asked to reason step-by-step before answering a question. However, it is unclear to what degree the model's final answer is faithful to the stated reasoning steps. In this paper, we perform a causal mediation analysis on twelve LLMs to examine how intermediate reasoning steps generated by the LLM influence the final outcome and find that LLMs do not reliably use their intermediate reasoning steps when generating an answer. To address this issue, we introduce FRODO, a framework to tailor small-sized LMs to generate correct reasoning steps and robustly reason over these steps. FRODO consists of an inference module that learns to generate correct reasoning steps using an implicit causal reward function and a reasoning module that learns to faithfully reason over these intermediate inferences using a counterfactual and causal preference objective. Our experiments show that FRODO significantly outperforms four competitive baselines. Furthermore, FRODO improves the robustness and generalization ability of the reasoning LM, yielding higher performance on out-of-distribution test sets. Finally, we find that FRODO's rationales are more faithful to its final answer predictions than standard supervised fine-tuning.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.1395

Country: Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

{\delta}-CAUSAL: Exploring Defeasibility in Causal Reasoning

Cui, Shaobo, Milikic, Lazar, Feng, Yiyang, Ismayilzada, Mete, Paul, Debjit, Bosselut, Antoine, Faltings, Boi

arXiv.org Artificial IntelligenceJan-6-2024

Defeasibility in causal reasoning implies that the causal relationship between cause and effect can be strengthened or weakened. Namely, the causal strength between cause and effect should increase or decrease with the incorporation of strengthening arguments (supporters) or weakening arguments (defeaters), respectively. However, existing works ignore defeasibility in causal reasoning and fail to evaluate existing causal strength metrics in defeasible settings. In this work, we present {\delta}-CAUSAL, the first benchmark dataset for studying defeasibility in causal reasoning. {\delta}-CAUSAL includes around 11K events spanning ten domains, featuring defeasible causality pairs, i.e., cause-effect pairs accompanied by supporters and defeaters. We further show current causal strength metrics fail to reflect the change of causal strength with the incorporation of supporters or defeaters in {\delta}-CAUSAL. To this end, we propose CESAR (Causal Embedding aSsociation with Attention Rating), a metric that measures causal strength based on token-level causal relationships. CESAR achieves a significant 69.7% relative improvement over existing metrics, increasing from 47.2% to 80.1% in capturing the causal strength change brought by supporters and defeaters. We further demonstrate even Large Language Models (LLMs) like GPT-3.5 still lag 4.5 and 10.7 points behind humans in generating supporters and defeaters, emphasizing the challenge posed by {\delta}-CAUSAL.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2401.03183

Country:

Europe (1.00)
Africa (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area (0.93)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

CRAB: Assessing the Strength of Causal Relationships Between Real-world Events

Romanou, Angelika, Montariol, Syrielle, Paul, Debjit, Laugier, Leo, Aberer, Karl, Bosselut, Antoine

arXiv.org Artificial IntelligenceNov-7-2023

Understanding narratives requires reasoning about the cause-and-effect relationships between events mentioned in the text. While existing foundation models yield impressive results in many NLP tasks requiring reasoning, it is unclear whether they understand the complexity of the underlying network of causal relationships of events in narratives. In this work, we present CRAB, a new Causal Reasoning Assessment Benchmark designed to evaluate causal understanding of events in real-world narratives. CRAB contains fine-grained, contextual causality annotations for ~2.7K pairs of real-world events that describe various newsworthy event timelines (e.g., the acquisition of Twitter by Elon Musk). Using CRAB, we measure the performance of several large language models, demonstrating that most systems achieve poor performance on the task. Motivated by classical causal principles, we also analyze the causal structures of groups of events in CRAB, and find that models perform worse on causal reasoning when events are derived from complex causal structures compared to simple linear causal chains. We make our dataset and code available to the research community.

artificial intelligence, causal relationship, natural language, (3 more...)

arXiv.org Artificial Intelligence

2311.04284

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.87)

Add feedback

CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks

Ismayilzada, Mete, Paul, Debjit, Montariol, Syrielle, Geva, Mor, Bosselut, Antoine

arXiv.org Artificial IntelligenceOct-23-2023

Recent efforts in natural language processing (NLP) commonsense reasoning research have yielded a considerable number of new datasets and benchmarks. However, most of these datasets formulate commonsense reasoning challenges in artificial scenarios that are not reflective of the tasks which real-world NLP systems are designed to solve. In this work, we present CRoW, a manually-curated, multi-task benchmark that evaluates the ability of models to apply commonsense reasoning in the context of six real-world NLP tasks. CRoW is constructed using a multi-stage data collection pipeline that rewrites examples from existing datasets using commonsense-violating perturbations. We use CRoW to study how NLP systems perform across different dimensions of commonsense knowledge, such as physical, temporal, and social reasoning. We find a significant performance gap when NLP systems are evaluated on CRoW compared to humans, showcasing that commonsense reasoning is far from being solved in real-world task settings. We make our dataset and leaderboard available to the research community at https://github.com/mismayil/crow.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2310.15239

Country:

Europe (1.00)
Asia (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.47)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Flows: Building Blocks of Reasoning and Collaborating AI

Josifoski, Martin, Klein, Lars, Peyrard, Maxime, Li, Yifei, Geng, Saibo, Schnitzler, Julian Paul, Yao, Yuxing, Wei, Jiheng, Paul, Debjit, West, Robert

arXiv.org Artificial IntelligenceAug-2-2023

Recent advances in artificial intelligence (AI) have produced highly capable and controllable systems. This creates unprecedented opportunities for structured reasoning as well as collaboration among multiple AI systems and humans. To fully realize this potential, it is essential to develop a principled way of designing and studying such structured interactions. For this purpose, we introduce the conceptual framework of Flows: a systematic approach to modeling complex interactions. Flows are self-contained building blocks of computation, with an isolated state, communicating through a standardized message-based interface. This modular design allows Flows to be recursively composed into arbitrarily nested interactions, with a substantial reduction of complexity. Crucially, any interaction can be implemented using this framework, including prior work on AI--AI and human--AI interactions, prompt engineering schemes, and tool augmentation. We demonstrate the potential of Flows on the task of competitive coding, a challenging task on which even GPT-4 struggles. Our results suggest that structured reasoning and collaboration substantially improve generalization, with AI-only Flows adding +$21$ and human--AI Flows adding +$54$ absolute points in terms of solve rate. To support rapid and rigorous research, we introduce the aiFlows library. The library comes with a repository of Flows that can be easily used, extended, and composed into novel, more complex Flows. The aiFlows library is available at https://github.com/epfl-dlab/aiflows. Data and Flows for reproducing our experiments are available at https://github.com/epfl-dlab/cc_flows.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2308.01285

Country:

North America > United States (0.28)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

REFINER: Reasoning Feedback on Intermediate Representations

Paul, Debjit, Ismayilzada, Mete, Peyrard, Maxime, Borges, Beatriz, Bosselut, Antoine, West, Robert, Faltings, Boi

arXiv.org Artificial IntelligenceApr-4-2023

Language models (LMs) have recently shown remarkable performance on reasoning tasks by explicitly generating intermediate inferences, e.g., chain-of-thought prompting. However, these intermediate inference steps may be inappropriate deductions from the initial context and lead to incorrect final predictions. Here we introduce REFINER, a framework for finetuning LMs to explicitly generate intermediate reasoning steps while interacting with a critic model that provides automated feedback on the reasoning. Specifically, the critic provides structured feedback that the reasoning LM uses to iteratively improve its intermediate arguments. Empirical evaluations of REFINER on three diverse reasoning tasks show significant improvements over baseline LMs of comparable scale. Furthermore, when using GPT3.5 as the reasoner, the trained critic significantly improves reasoning without finetuning the reasoner. Finally, our critic model is trained without expensive human-in-the-loop data but can be substituted with humans at inference time.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2304.01904

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.66)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Abductive Reasoning (0.48)

Add feedback

Language Model Decoding as Likelihood-Utility Alignment

Josifoski, Martin, Peyrard, Maxime, Rajic, Frano, Wei, Jiheng, Paul, Debjit, Hartmann, Valentin, Patra, Barun, Chaudhary, Vishrav, Kıcıman, Emre, Faltings, Boi, West, Robert

arXiv.org Artificial IntelligenceMar-16-2023

A critical component of a successful language generation pipeline is the decoding algorithm. However, the general principles that should guide the choice of a decoding algorithm remain unclear. Previous works only compare decoding algorithms in narrow scenarios, and their findings do not generalize across tasks. We argue that the misalignment between the model's likelihood and the task-specific notion of utility is the key factor to understanding the effectiveness of decoding algorithms. To structure the discussion, we introduce a taxonomy of misalignment mitigation strategies (MMSs), providing a unifying view of decoding as a tool for alignment. The MMS taxonomy groups decoding algorithms based on their implicit assumptions about likelihood--utility misalignment, yielding general statements about their applicability across tasks. Specifically, by analyzing the correlation between the likelihood and the utility of predictions across a diverse set of tasks, we provide empirical evidence supporting the proposed taxonomy and a set of principles to structure reasoning when choosing a decoding algorithm. Crucially, our analysis is the first to relate likelihood-based decoding algorithms with algorithms that rely on external information, such as value-guided methods and prompting, and covers the most diverse set of tasks to date. Code, data, and models are available at https://github.com/epfl-dlab/understanding-decoding.

language model decoding, likelihood-utility alignment

arXiv.org Artificial Intelligence

2210.07228

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.69)

Add feedback