AITopics | Montariol, Syrielle

Plotting

Montariol, Syrielle

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VinaBench: Benchmark for Faithful and Consistent Visual Narratives

Gao, Silin, Mathew, Sheryl, Mi, Li, Mamooler, Sepideh, Zhao, Mengjie, Wakaki, Hiromi, Mitsufuji, Yuki, Montariol, Syrielle, Bosselut, Antoine

arXiv.org Artificial IntelligenceApr-3-2025

Visual narrative generation transforms textual narratives into sequences of images illustrating the content of the text. However, generating visual narratives that are faithful to the input text and self-consistent across generated images remains an open challenge, due to the lack of knowledge constraints used for planning the stories. In this work, we propose a new benchmark, VinaBench, to address this challenge. Our benchmark annotates the underlying commonsense and discourse constraints in visual narrative samples, offering systematic scaffolds for learning the implicit strategies of visual storytelling. Based on the incorporated narrative constraints, we further propose novel metrics to closely evaluate the consistency of generated narrative images and the alignment of generations with the input textual narrative. Our results across three generative vision models demonstrate that learning with VinaBench's knowledge constraints effectively improves the faithfulness and cohesion of generated visual narratives.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.20871

Genre: Research Report > New Finding (0.48)

Industry: Media (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.52)

Add feedback

DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests

Corbière, Charles, Roburin, Simon, Montariol, Syrielle, Bosselut, Antoine, Alahi, Alexandre

arXiv.org Artificial IntelligenceJan-8-2025

Large vision-language models (LVLMs) augment language models with visual understanding, enabling multimodal reasoning. However, due to the modality gap between textual and visual data, they often face significant challenges, such as over-reliance on text priors, hallucinations, and limited capacity for complex visual reasoning. Existing benchmarks to evaluate visual reasoning in LVLMs often rely on schematic or synthetic images and on imprecise machine-generated explanations. To bridge the modality gap, we present DrivingVQA, a new benchmark derived from driving theory tests to evaluate visual chain-of-thought reasoning in complex real-world scenarios. It offers 3,931 expert-crafted multiple-choice problems and interleaved explanations grounded with entities relevant to the reasoning process. We leverage this dataset to perform an extensive study of LVLMs' ability to reason about complex visual scenarios. Our experiments reveal that open-source and proprietary LVLMs struggle with visual chain-of-thought reasoning under zero-shot settings. We investigate training strategies that leverage relevant entities to improve visual reasoning. Notably, we observe a performance boost of up to 7\% when reasoning over image tokens of cropped regions tied to these entities.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.04671

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection

Mamooler, Sepideh, Montariol, Syrielle, Mathis, Alexander, Bosselut, Antoine

arXiv.org Artificial IntelligenceDec-16-2024

In-context learning (ICL) enables Large Language Models (LLMs) to perform tasks using few demonstrations, facilitating task adaptation when labeled examples are hard to obtain. However, ICL is sensitive to the choice of demonstrations, and it remains unclear which demonstration attributes enable in-context generalization. In this work, we conduct a perturbation study of in-context demonstrations for low-resource Named Entity Detection (NED). Our surprising finding is that in-context demonstrations with partially correct annotated entity mentions can be as effective for task transfer as fully correct demonstrations. Based off our findings, we propose Pseudo-annotated In-Context Learning (PICLe), a framework for in-context learning with noisy, pseudo-annotated demonstrations. PICLe leverages LLMs to annotate many demonstrations in a zero-shot first pass. We then cluster these synthetic demonstrations, sample specific sets of in-context demonstrations from each cluster, and predict entity mentions using each set independently. Finally, we use self-verification to select the final set of entity mentions. We evaluate PICLe on five biomedical NED datasets and show that, with zero human annotation, PICLe outperforms ICL in low-resource settings where limited gold examples can be used as in-context demonstrations.

demonstration, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.11923

Country:

Asia (0.47)
North America > Canada (0.15)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

Romanou, Angelika, Foroutan, Negar, Sotnikova, Anna, Chen, Zeming, Nelaturu, Sree Harsha, Singh, Shivalika, Maheshwary, Rishabh, Altomare, Micol, Haggag, Mohamed A., A, Snegha, Amayuelas, Alfonso, Amirudin, Azril Hafizi, Aryabumi, Viraat, Boiko, Danylo, Chang, Michael, Chim, Jenny, Cohen, Gal, Dalmia, Aditya Kumar, Diress, Abraham, Duwal, Sharad, Dzenhaliou, Daniil, Florez, Daniel Fernando Erazo, Farestam, Fabian, Imperial, Joseph Marvin, Islam, Shayekh Bin, Isotalo, Perttu, Jabbarishiviari, Maral, Karlsson, Börje F., Khalilov, Eldar, Klamm, Christopher, Koto, Fajri, Krzemiński, Dominik, de Melo, Gabriel Adriano, Montariol, Syrielle, Nan, Yiyang, Niklaus, Joel, Novikova, Jekaterina, Ceron, Johan Samir Obando, Paul, Debjit, Ploeger, Esther, Purbey, Jebish, Rajwal, Swati, Ravi, Selvan Sunitha, Rydell, Sara, Santhosh, Roshan, Sharma, Drishti, Skenduli, Marjana Prifti, Moakhar, Arshia Soltani, Moakhar, Bardia Soltani, Tamir, Ran, Tarun, Ayush Kumar, Wasi, Azmine Toushik, Weerasinghe, Thenuka Ovin, Yilmaz, Serhan, Zhang, Mike, Schlag, Imanol, Fadaee, Marzieh, Hooker, Sara, Bosselut, Antoine

arXiv.org Artificial IntelligenceNov-29-2024

The performance differential of large language models (LLM) between languages hinders their effective deployment in many regions, inhibiting the potential economic and societal value of generative AI tools in many communities. However, the development of functional LLMs in many languages (i.e., multilingual LLMs) is bottlenecked by the lack of high-quality evaluation resources in languages other than English. Moreover, current practices in multilingual benchmark construction often translate English resources, ignoring the regional and cultural knowledge of the environments in which multilingual systems would be used. In this work, we construct an evaluation suite of 197,243 QA pairs from local exam sources to measure the capabilities of multilingual LLMs in a variety of regional contexts. The rapid advancement of AI technologies underscores the importance of developing LLMs that are proficient across diverse linguistic and cultural contexts, ensuring fair and equitable performance for stakeholders from various language groups. However, the lack of high-quality evaluation benchmarks in many languages discourages practitioners from training multilingual LLMs to meet this challenge. This evaluation gap limits the effective deployment of LLMs for many regions, exacerbates digital divides, and inhibits the economic and societal value of AI tools in many underserved communities. The source of this gap is the multitude of challenges in evaluating LLMs for multilingual contexts. First, at a meta-level, the majority of benchmarks for LLMs are only in English (Hendrycks et al., 2020, inter alia). Technical challenges also abound due to the manner in which multilingual datasets are often collected. Certain datasets are constructed using manually applied templates, resulting in low prompt and completion diversity (Muennighoff et al., 2022). Many more are composed of translations from high-resource languages (e.g., English; Holtermann et al., 2024; Myung et al., 2024; Lai et al., 2023; Foroutan et al., 2023). These datasets often contain errors (Ponti et al., 2020; Plaza et al., 2024) and create translationese artifacts (Vanmassenhove et al., 2021; Hartung et al., 2023; Savoldi et al., 2021; Ji et al., 2023).

large language model, machine learning, nclude, (18 more...)

arXiv.org Artificial Intelligence

2411.19799

Country:

Europe (0.92)
North America > United States (0.46)
Asia > Middle East (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Government (1.00)
Education > Curriculum > Subject-Specific Education (0.92)
Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

"Flex Tape Can't Fix That": Bias and Misinformation in Edited Language Models

Halevy, Karina, Sotnikova, Anna, AlKhamissi, Badr, Montariol, Syrielle, Bosselut, Antoine

arXiv.org Artificial IntelligenceJun-16-2024

Model editing has emerged as a cost-effective strategy to update knowledge stored in language models. However, model editing can have unintended consequences after edits are applied: information unrelated to the edits can also be changed, and other general behaviors of the model can be wrongly altered. In this work, we investigate how model editing methods unexpectedly amplify model biases post-edit. We introduce a novel benchmark dataset, Seesaw-CF, for measuring bias-related harms of model editing and conduct the first in-depth investigation of how different weight-editing methods impact model bias. Specifically, we focus on biases with respect to demographic attributes such as race, geographic origin, and gender, as well as qualitative flaws in long-form texts generated by edited language models. We find that edited models exhibit, to various degrees, more biased behavior as they become less confident in attributes for Asian, African, and South American subjects. Furthermore, edited models amplify sexism and xenophobia in text generations while remaining seemingly coherent and logical. Finally, editing facts about place of birth, country of citizenship, or gender have particularly negative effects on the model's knowledge about unrelated features like field of work.

citizenship, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2403.0018

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Maryland (0.14)

Genre: Research Report > New Finding (0.94)

Industry:

Law (0.47)
Media > News (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

Course Recommender Systems Need to Consider the Job Market

Frej, Jibril, Dai, Anna, Montariol, Syrielle, Bosselut, Antoine, Käser, Tanja

arXiv.org Artificial IntelligenceMay-1-2024

Current course recommender systems primarily leverage learner-course interactions, course content, learner preferences, and supplementary course details like instructor, institution, ratings, and reviews, to make their recommendation. However, these systems often overlook a critical aspect: the evolving skill demand of the job market. This paper focuses on the perspective of academic researchers, working in collaboration with the industry, aiming to develop a course recommender system that incorporates job market skill demands. In light of the job market's rapid changes and the current state of research in course recommender systems, we outline essential properties for course recommender systems to address these demands effectively, including explainable, sequential, unsupervised, and aligned with the job market and user's goals. Our discussion extends to the challenges and research questions this objective entails, including unsupervised skill extraction from job listings, course descriptions, and resumes, as well as predicting recommendations that align with learner objectives and the job market and designing metrics to evaluate this alignment. Furthermore, we introduce an initial system that addresses some existing limitations of course recommender systems using large Language Models (LLMs) for skill extraction and Reinforcement Learning (RL) for alignment with the job market. We provide empirical results using open-source data to demonstrate its effectiveness.

artificial intelligence, natural language, recommender system, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3626772.3657847.

2404.10876

Country:

Europe (1.00)
Oceania > Australia > New South Wales (0.14)
North America > United States > Hawaii (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre:

Research Report > New Finding (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Setting > Online (0.95)
Education > Educational Technology > Educational Software > Computer Based Training (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Multi-Task Learning for Features Extraction in Financial Annual Reports

Montariol, Syrielle, Martinc, Matej, Pelicon, Andraž, Pollak, Senja, Koloski, Boshko, Lončarski, Igor, Valentinčič, Aljoša

arXiv.org Artificial IntelligenceApr-8-2024

For assessing various performance indicators of companies, the focus is shifting from strictly financial (quantitative) publicly disclosed information to qualitative (textual) information. This textual data can provide valuable weak signals, for example through stylistic features, which can complement the quantitative data on financial performance or on Environmental, Social and Governance (ESG) criteria. In this work, we use various multi-task learning methods for financial text classification with the focus on financial sentiment, objectivity, forward-looking sentence prediction and ESG-content detection. We propose different methods to combine the information extracted from training jointly on different tasks; our best-performing method highlights the positive effect of explicitly adding auxiliary task predictions as features for the final target task during the multi-task training. Next, we use these classifiers to extract textual features from annual reports of FTSE350 companies and investigate the link between ESG quantitative scores and these features.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2404.05281

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Industry:

Banking & Finance (1.00)
Government (0.68)
Law > Business Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

ConVQG: Contrastive Visual Question Generation with Multimodal Guidance

Mi, Li, Montariol, Syrielle, Castillo-Navarro, Javiera, Dai, Xianjie, Bosselut, Antoine, Tuia, Devis

arXiv.org Artificial IntelligenceFeb-20-2024

Asking questions about visual environments is a crucial way for intelligent agents to understand rich multi-faceted scenes, raising the importance of Visual Question Generation (VQG) systems. Apart from being grounded to the image, existing VQG systems can use textual constraints, such as expected answers or knowledge triplets, to generate focused questions. These constraints allow VQG systems to specify the question content or leverage external commonsense knowledge that can not be obtained from the image content only. However, generating focused questions using textual constraints while enforcing a high relevance to the image content remains a challenge, as VQG systems often ignore one or both forms of grounding. In this work, we propose Contrastive Visual Question Generation (ConVQG), a method using a dual contrastive objective to discriminate questions generated using both modalities from those based on a single one. Experiments on both knowledge-aware and standard VQG benchmarks demonstrate that ConVQG outperforms the state-of-the-art methods and generates image-grounded, text-guided, and knowledge-rich questions. Our human evaluation results also show preference for ConVQG questions compared to non-contrastive baselines.

machine learning, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

2402.12846

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Rethinking Skill Extraction in the Job Market Domain using Large Language Models

Nguyen, Khanh Cao, Zhang, Mike, Montariol, Syrielle, Bosselut, Antoine

arXiv.org Artificial IntelligenceFeb-6-2024

Skill Extraction involves identifying skills and qualifications mentioned in documents such as job postings and resumes. The task is commonly tackled by training supervised models using a sequence labeling approach with BIO tags. However, the reliance on manually annotated data limits the generalizability of such approaches. Moreover, the common BIO setting limits the ability of the models to capture complex skill patterns and handle ambiguous mentions. In this paper, we explore the use of in-context learning to overcome these challenges, on a benchmark of 6 uniformized skill extraction datasets. Our approach leverages the few-shot learning capabilities of large language models (LLMs) to identify and extract skills from sentences. We show that LLMs, despite not being on par with traditional supervised models in terms of performance, can better handle syntactically complex skill mentions in skill extraction tasks.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.03832

Country:

Europe (1.00)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Texas (0.14)

Genre: Research Report (0.82)

Industry: Banking & Finance (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

JOBSKAPE: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching

Magron, Antoine, Dai, Anna, Zhang, Mike, Montariol, Syrielle, Bosselut, Antoine

arXiv.org Artificial IntelligenceFeb-5-2024

Recent approaches in skill matching, employing synthetic training data for classification or similarity model training, have shown promising results, reducing the need for time-consuming and expensive annotations. However, previous synthetic datasets have limitations, such as featuring only one skill per sentence and generally comprising short sentences. In this paper, we introduce JobSkape, a framework to generate synthetic data that tackles these limitations, specifically designed to enhance skill-to-taxonomy matching. Within this framework, we create SkillSkape, a comprehensive open-source synthetic dataset of job postings tailored for skill-matching tasks. We introduce several offline metrics that show that our dataset resembles real-world data. Additionally, we present a multi-step pipeline for skill extraction and matching tasks using large language models (LLMs), benchmarking against known supervised methodologies. We outline that the downstream evaluation results on real-world data can beat baselines, underscoring its efficacy and adaptability.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2402.03242

Country:

Europe (1.00)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.68)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback