AITopics

2310.16995

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.87)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.80)

Yoon, Hokeun, Bak, JinYeong

Diversity Enhanced Narrative Question Generation for Storybooks

arXiv.org Artificial IntelligenceOct-25-2023

Question generation (QG) from a given context can enhance comprehension, engagement, assessment, and overall efficacy in learning or conversational environments. Despite recent advancements in QG, the challenge of enhancing or measuring the diversity of generated questions often remains unaddressed. In this paper, we introduce a multi-question generation model (mQG), which is capable of generating multiple, diverse, and answerable questions by focusing on context and questions. To validate the answerability of the generated questions, we employ a SQuAD2.0 fine-tuned question answering model, classifying the questions as answerable or not. We train and evaluate mQG on the FairytaleQA dataset, a well-structured QA dataset based on storybooks, with narrative questions. We further apply a zero-shot adaptation on the TellMeWhy and SQuAD1.1 datasets. mQG shows promising results across various evaluation metrics, among strong baselines.

diversity enhanced narrative question generation, storybook

2310.16446

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Kawaharazuka, Kento, Obinata, Yoshiki, Kanazawa, Naoaki, Okada, Kei, Inaba, Masayuki

Binary State Recognition by Robots using Visual Question Answering of Pre-Trained Vision-Language Model

arXiv.org Artificial IntelligenceOct-25-2023

Recognition of the current state is indispensable for the operation of a robot. There are various states to be recognized, such as whether an elevator door is open or closed, whether an object has been grasped correctly, and whether the TV is turned on or off. Until now, these states have been recognized by programmatically describing the state of a point cloud or raw image, by annotating and learning images, by using special sensors, etc. In contrast to these methods, we apply Visual Question Answering (VQA) from a Pre-Trained Vision-Language Model (PTVLM) trained on a large-scale dataset, to such binary state recognition. This idea allows us to intuitively describe state recognition in language without any re-training, thereby improving the recognition ability of robots in a simple and general way. We summarize various techniques in questioning methods and image processing, and clarify their properties through experiments.

binary state recognition, pre-trained vision-language model, robot

2310.16405

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (0.80)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.60)

Wu, Yating, Sheffield, William, Mahowald, Kyle, Li, Junyi Jessy

Elaborative Simplification as Implicit Questions Under Discussion

arXiv.org Artificial IntelligenceOct-24-2023

Automated text simplification, a technique useful for making text more accessible to people such as children and emergent bilinguals, is often thought of as a monolingual translation task from complex sentences to simplified sentences using encoder-decoder models. This view fails to account for elaborative simplification, where new information is added into the simplified text. This paper proposes to view elaborative simplification through the lens of the Question Under Discussion (QUD) framework, providing a robust way to investigate what writers elaborate upon, how they elaborate, and how elaborations fit into the discourse context by viewing elaborations as explicit answers to implicit questions. We introduce ElabQUD, consisting of 1.3K elaborations accompanied with implicit QUDs, to study these phenomena. We show that explicitly modeling QUD (via question generation) not only provides essential understanding of elaborative simplification and how the elaborations connect with the rest of the discourse, but also substantially improves the quality of elaboration generation.

elaboration, qud, simplification, (16 more...)

2305.10387

Country:

North America > Mexico (0.04)
North America > Honduras (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Education (0.68)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.49)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

arXiv.org Artificial IntelligenceOct-24-2023

MaXM: Towards Multilingual Visual Question Answering

Changpinyo, Soravit, Xue, Linting, Yarom, Michal, Thapliyal, Ashish V., Szpektor, Idan, Amelot, Julien, Chen, Xi, Soricut, Radu

Visual Question Answering (VQA) has been primarily studied through the lens of the English language. Yet, tackling VQA in other languages in the same manner would require a considerable amount of resources. In this paper, we propose scalable solutions to multilingual visual question answering (mVQA), on both data and modeling fronts. We first propose a translation-based framework to mVQA data generation that requires much less human annotation efforts than the conventional approach of directly collection questions and answers. Then, we apply our framework to the multilingual captions in the Crossmodal-3600 dataset and develop an efficient annotation protocol to create MaXM, a test-only VQA benchmark in 7 diverse languages. Finally, we develop a simple, lightweight, and effective approach as well as benchmark state-of-the-art English and multilingual VQA models. We hope that our benchmark encourages further research on mVQA.

benchmark, caption, dataset, (17 more...)

2209.05401

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Coman, Andrei C., Barlacchi, Gianni, de Gispert, Adrià

Strong and Efficient Baselines for Open Domain Conversational Question Answering

Unlike the Open Domain Question Answering (ODQA) setting, the conversational (ODConvQA) domain has received limited attention when it comes to reevaluating baselines for both efficiency and effectiveness. In this paper, we study the State-of-the-Art (SotA) Dense Passage Retrieval (DPR) retriever and Fusion-in-Decoder (FiD) reader pipeline, and show that it significantly underperforms when applied to ODConvQA tasks due to various limitations. We then propose and evaluate strong yet simple and efficient baselines, by introducing a fast reranking component between the retriever and the reader, and by performing targeted finetuning steps. Experiments on two ODConvQA tasks, namely TopiOCQA and OR-QuAC, show that our method improves the SotA results, while reducing reader's latency by 60%. Finally, we provide new and valuable insights into the development of challenging baselines that serve as a reference for future, more intricate approaches, including those that leverage Large Language Models (LLMs).

computational linguistic, opi ocqa, semanticreranker, (14 more...)

2310.14708

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.63)

A Review of Reinforcement Learning for Natural Language Processing, and Applications in Healthcare

Liu, Ying, Wang, Haozhu, Zhou, Huixue, Li, Mingchen, Hou, Yu, Zhou, Sicheng, Wang, Fang, Hoetzlein, Rama, Zhang, Rui

Reinforcement learning (RL) has emerged as a powerful approach for tackling complex medical decision-making problems such as treatment planning, personalized medicine, and optimizing the scheduling of surgeries and appointments. It has gained significant attention in the field of Natural Language Processing (NLP) due to its ability to learn optimal strategies for tasks such as dialogue systems, machine translation, and question-answering. This paper presents a review of the RL techniques in NLP, highlighting key advancements, challenges, and applications in healthcare. The review begins by visualizing a roadmap of machine learning and its applications in healthcare. And then it explores the integration of RL with NLP tasks. We examined dialogue systems where RL enables the learning of conversational strategies, RL-based machine translation models, question-answering systems, text summarization, and information extraction. Additionally, ethical considerations and biases in RL-NLP systems are addressed.

application, natural language processing, reinforcement learning, (1 more...)

2310.18354

Genre: Research Report (0.40)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Hashemi, Maryam, Mahmoudi, Ghazaleh, Kodeiri, Sara, Sheikhi, Hadi, Eetemadi, Sauleh

LXMERT Model Compression for Visual Question Answering

Large-scale pretrained models such as LXMERT are becoming popular for learning cross-modal representations on text-image pairs for vision-language tasks. According to the lottery ticket hypothesis, NLP and computer vision models contain smaller subnetworks capable of being trained in isolation to full performance. In this paper, we combine these observations to evaluate whether such trainable subnetworks exist in LXMERT when fine-tuned on the VQA task. In addition, we perform a model size cost-benefit analysis by investigating how much pruning can be done without significant loss in accuracy. Our experiment results demonstrate that LXMERT can be effectively pruned by 40%-60% in size with 3% loss in accuracy.

lxmert, pruning, subnetwork, (14 more...)

2310.15325

Country: Asia > Middle East > Iran (0.05)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Vision (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.43)

Suwono, Nicholas Collin, Chen, Justin Chih-Yao, Hung, Tun Min, Huang, Ting-Hao Kenneth, Liao, I-Bin, Li, Yung-Hui, Ku, Lun-Wei, Sun, Shao-Hua

Location-Aware Visual Question Generation with Lightweight Models

lightweight model, location-aware visual question generation

This work introduces a novel task, location-aware visual question generation (LocaVQG), which aims to generate engaging questions from data relevant to a particular geographical location. Specifically, we represent such location-aware information with surrounding images and a GPS coordinate. To tackle this task, we present a dataset generation pipeline that leverages GPT-4 to produce diverse and sophisticated questions. Then, we aim to learn a lightweight model that can address the LocaVQG task and fit on an edge device, such as a mobile phone. To this end, we propose a method which can reliably generate engaging questions from location-aware information. Our proposed method outperforms baselines regarding human evaluation (e.g., engagement, grounding, coherence) and automatic evaluation metrics (e.g., BERTScore, ROUGE-2). Moreover, we conduct extensive ablation studies to justify our proposed techniques for both generating the dataset and solving the task.

2310.15129

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.60)

API-Assisted Code Generation for Question Answering on Varied Table Structures

Cao, Yihan, Chen, Shuyi, Liu, Ryan, Wang, Zhiruo, Fried, Daniel

A persistent challenge to table question answering (TableQA) by generating executable programs has been adapting to varied table structures, typically requiring domain-specific logical forms. In response, this paper introduces a unified TableQA framework that: (1) provides a unified representation for structured tables as multi-index Pandas data frames, (2) uses Python as a powerful querying language, and (3) uses few-shot prompting to translate NL questions into Python programs, which are executable on Pandas data frames. Furthermore, to answer complex relational questions with extended program functionality and external knowledge, our framework allows customized APIs that Python programs can call. We experiment with four TableQA datasets that involve tables of different structures -- relational, multi-table, and hierarchical matrix shapes -- and achieve prominent improvements over past state-of-the-art systems. In ablation studies, we (1) show benefits from our multi-index representation and APIs over baselines that use only an LLM, and (2) demonstrate that our approach is modular and can incorporate additional APIs.

computational linguistic, dataset, table structure, (14 more...)

2310.14687

Country:

North America > Canada > Manitoba > Winnipeg Metropolitan Region > Winnipeg (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(10 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)