AITopics

2402.16508

Country:

North America > United States > Washington > King County > Seattle (0.14)
Oceania > Australia > Victoria > Melbourne (0.14)
Europe > Russia (0.04)
(21 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Faldu, Prayushi, Bhattacharya, Indrajit, Mausam, null

RetinaQA: A Robust Knowledge Base Question Answering Model for both Answerable and Unanswerable Questions

arXiv.org Artificial IntelligenceJun-15-2024

An essential requirement for a real-world Knowledge Base Question Answering (KBQA) system is the ability to detect answerability of questions when generating logical forms. However, state-of-the-art KBQA models assume all questions to be answerable. Recent research has found that such models, when superficially adapted to detect answerability, struggle to satisfactorily identify the different categories of unanswerable questions, and simultaneously preserve good performance for answerable questions. Towards addressing this issue, we propose RetinaQA, a new KBQA model that unifies two key ideas in a single KBQA architecture: (a) discrimination over candidate logical forms, rather than generating these, for handling schema-related unanswerability, and (b) sketch-filling-based construction of candidate logical forms for handling data-related unaswerability. Our results show that RetinaQA significantly outperforms adaptations of state-of-the-art KBQA models in handling both answerable and unanswerable questions and demonstrates robustness across all categories of unanswerability. Notably, RetinaQA also sets a new state-of-the-art for answerable KBQA, surpassing existing models.

logical form, retinaqa, unanswerable question, (14 more...)

2403.10849

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.86)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.62)

arXiv.org Artificial IntelligenceJun-15-2024

NewsQs: Multi-Source Question Generation for the Inquiring Mind

Hwang, Alyssa, Dixit, Kalpit, Ballesteros, Miguel, Benajiba, Yassine, Castelli, Vittorio, Dreyer, Markus, Bansal, Mohit, McKeown, Kathleen

We present NewsQs (news-cues), a dataset that provides question-answer pairs for multiple news documents. To create NewsQs, we augment a traditional multi-document summarization dataset with questions automatically generated by a T5-Large model fine-tuned on FAQ-style news articles from the News On the Web corpus. We show that fine-tuning a model with control codes produces questions that are judged acceptable more often than the same model without them as measured through human evaluation. We use a QNLI model with high correlation with human annotations to filter our data. We release our final dataset of high-quality questions, answers, and document clusters as a resource for future work in query-based multi-document summarization.

annotator, computational linguistic, dataset, (14 more...)

2402.18479

Country:

Europe > United Kingdom (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Pennsylvania (0.04)
(18 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Government > Voting & Elections (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.65)

Enhancing Question Answering on Charts Through Effective Pre-training Tasks

Gupta, Ashim, Gupta, Vivek, Zhang, Shuo, He, Yujie, Zhang, Ning, Shah, Shalin

To completely understand a document, the use of textual information is not enough. Understanding visual cues, such as layouts and charts, is also required. While the current state-of-the-art approaches for document understanding (both OCR-based and OCR-free) work well, a thorough analysis of their capabilities and limitations has not yet been performed. Therefore, in this work, we addresses the limitation of current VisualQA models when applied to charts and plots. To investigate shortcomings of the state-of-the-art models, we conduct a comprehensive behavioral analysis, using ChartQA as a case study. Our findings indicate that existing models particularly underperform in answering questions related to the chart's structural and visual context, as well as numerical information. To address these issues, we propose three simple pre-training tasks that enforce the existing model in terms of both structural-visual knowledge, as well as its understanding of numerical questions. We evaluate our pre-trained model (called MatCha-v2) on three chart datasets - both extractive and abstractive question datasets - and observe that it achieves an average improvement of 1.7% over the baseline model.

checklist, computational linguistic, dataset, (14 more...)

2406.10085

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
Oceania > Fiji (0.05)
North America > United States > Utah (0.05)
(8 more...)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Jhalani, Manas, M, Annervaz K, Bhattacharyya, Pushpak

Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models

In the realm of multimodal tasks, Visual Question Answering (VQA) plays a crucial role by addressing natural language questions grounded in visual content. Knowledge-Based Visual Question Answering (KBVQA) advances this concept by adding external knowledge along with images to respond to questions. We introduce an approach for KBVQA, augmenting the existing vision-language transformer encoder-decoder (OFA) model. Our main contribution involves enhancing questions by incorporating relevant external knowledge extracted from knowledge graphs, using a dynamic triple extraction method. We supply a flexible number of triples from the knowledge graph as context, tailored to meet the requirements for answering the question. Our model, enriched with knowledge, demonstrates an average improvement of 4.75\% in Exact Match Score over the state-of-the-art on three different KBVQA datasets. Through experiments and analysis, we demonstrate that furnishing variable triples for each question improves the reasoning capabilities of the language model in contrast to supplying a fixed number of triples. This is illustrated even for recent large language models. Additionally, we highlight the model's generalization capability by showcasing its SOTA-beating performance on a small dataset, achieved through straightforward fine-tuning.

dataset, information, knowledge, (14 more...)

2406.09994

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Slovakia (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.82)
(2 more...)

EWEK-QA: Enhanced Web and Efficient Knowledge Graph Retrieval for Citation-based Question Answering Systems

Dehghan, Mohammad, Alomrani, Mohammad Ali, Bagga, Sunyam, Alfonso-Hermelo, David, Bibi, Khalil, Ghaddar, Abbas, Zhang, Yingxue, Li, Xiaoguang, Hao, Jianye, Liu, Qun, Lin, Jimmy, Chen, Boxing, Parthasarathi, Prasanna, Biparva, Mahdi, Rezagholizadeh, Mehdi

The emerging citation-based QA systems are gaining more attention especially in generative AI search applications. The importance of extracted knowledge provided to these systems is vital from both accuracy (completeness of information) and efficiency (extracting the information in a timely manner). In this regard, citation-based QA systems are suffering from two shortcomings. First, they usually rely only on web as a source of extracted knowledge and adding other external knowledge sources can hamper the efficiency of the system. Second, web-retrieved contents are usually obtained by some simple heuristics such as fixed length or breakpoints which might lead to splitting information into pieces. To mitigate these issues, we propose our enhanced web and efficient knowledge graph (KG) retrieval solution (EWEK-QA) to enrich the content of the extracted knowledge fed to the system. This has been done through designing an adaptive web retriever and incorporating KGs triples in an efficient manner. We demonstrate the effectiveness of EWEK-QA over the open-source state-of-the-art (SoTA) web-based and KG baseline models using a comprehensive set of quantitative and human evaluation experiments. Our model is able to: first, improve the web-retriever baseline in terms of extracting more relevant passages (>20\%), the coverage of answer span (>25\%) and self containment (>35\%); second, obtain and integrate KG triples into its pipeline very efficiently (by avoiding any LLM calls) to outperform the web-only and KG-only SoTA baselines significantly in 7 quantitative QA tasks and our human evaluation.

dataset, ewek-qa, knowledge, (16 more...)

2406.10393

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
(14 more...)

Genre: Research Report (0.82)

Industry:

Media (0.94)
Information Technology (0.93)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes

Parmar, Paritosh, Peh, Eric, Chen, Ruirui, Lam, Ting En, Chen, Yuhan, Tan, Elston, Fernando, Basura

Causal video question answering (QA) has garnered increasing interest, yet existing datasets often lack depth in causal reasoning. To address this gap, we capitalize on the unique properties of cartoons and construct CausalChaos!, a novel, challenging causal Why-QA dataset built upon the iconic "Tom and Jerry" cartoon series. Cartoons use the principles of animation that allow animators to create expressive, unambiguous causal relationships between events to form a coherent storyline. Utilizing these properties, along with thought-provoking questions and multi-level answers (answer and detailed causal explanation), our questions involve causal chains that interconnect multiple dynamic interactions between characters and visual scenes. These factors demand models to solve more challenging, yet well-defined causal relationships. We also introduce hard incorrect answer mining, including a causally confusing version that is even more challenging. While models perform well, there is much room for improvement, especially, on open-ended answers. We identify more advanced/explicit causal relationship modeling & joint modeling of vision and language as the immediate areas for future efforts to focus upon. Along with the other complementary datasets, our new challenging dataset will pave the way for these developments in the field.

dataset, proceedings, reasoning, (11 more...)

2404.01299

Country:

Asia > Singapore (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > Canada > British Columbia > East Kootenay Region > Fernie (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
(2 more...)

Phukan, Orchid Chetia, Mallick, Priyabrata, Behera, Swarup Ranjan, Narayani, Aalekhya Satya, Buduru, Arun Balaji, Sharma, Rajesh

Towards Multilingual Audio-Visual Question Answering

arXiv.org Artificial IntelligenceJun-13-2024

In this paper, we work towards extending Audio-Visual Question Answering (AVQA) to multilingual settings. Existing AVQA research has predominantly revolved around English and replicating it for addressing AVQA in other languages requires a substantial allocation of resources. As a scalable solution, we leverage machine translation and present two multilingual AVQA datasets for eight languages created from existing benchmark AVQA datasets. This prevents extra human annotation efforts of collecting questions and answers manually. To this end, we propose, MERA framework, by leveraging state-of-the-art (SOTA) video, audio, and textual foundation models for AVQA in multiple languages. We introduce a suite of models namely MERA-L, MERA-C, MERA-T with varied model architectures to benchmark the proposed datasets. We believe our work will open new research directions and act as a reference benchmark for future works in multilingual AVQA.

avqa, dataset, mera-c, (16 more...)

2406.09156

Country:

Europe > Estonia > Tartu County > Tartu (0.04)
Asia > Singapore (0.04)
Asia > Middle East > Republic of Türkiye (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.49)

arXiv.org Artificial IntelligenceJun-13-2024

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

He, Xuehai, Feng, Weixi, Zheng, Kaizhi, Lu, Yujie, Zhu, Wanrong, Li, Jiachen, Fan, Yue, Wang, Jianfeng, Li, Linjie, Yang, Zhengyuan, Lin, Kevin, Wang, William Yang, Wang, Lijuan, Wang, Xin Eric

Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multimodal video understanding. MMWorld distinguishes itself from previous video understanding benchmarks with two unique advantages: (1) multi-discipline, covering various disciplines that often require domain expertise for comprehensive understanding; (2) multi-faceted reasoning, including explanation, counterfactual thinking, future prediction, etc. MMWorld consists of a human-annotated dataset to evaluate MLLMs with questions about the whole videos and a synthetic dataset to analyze MLLMs within a single modality of perception. Together, MMWorld encompasses 1,910 videos across seven broad disciplines and 69 subdisciplines, complete with 6,627 question-answer pairs and associated captions. The evaluation includes 2 proprietary and 10 open-source MLLMs, which struggle on MMWorld (e.g., GPT-4V performs the best with only 52.3\% accuracy), showing large room for improvement. Further ablation studies reveal other interesting findings such as models' different skill sets from humans. We hope MMWorld can serve as an essential step towards world model evaluation in videos.

arxiv preprint arxiv, dataset, video, (15 more...)

2406.08407

Country: Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine (1.00)
Information Technology (0.93)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Tami, Mohammad, Ashqar, Huthaifa I., Elhenawy, Mohammed

Automated Question Generation for Science Tests in Arabic Language Using NLP Techniques

arXiv.org Artificial IntelligenceJun-11-2024

Question generation for education assessments is a growing field within artificial intelligence applied to education. These question-generation tools have significant importance in the educational technology domain, such as intelligent tutoring systems and dialogue-based platforms. The automatic generation of assessment questions, which entail clear-cut answers, usually relies on syntactical and semantic indications within declarative sentences, which are then transformed into questions. Recent research has explored the generation of assessment educational questions in Arabic. The reported performance has been adversely affected by inherent errors, including sentence parsing inaccuracies, name entity recognition issues, and errors stemming from rule-based question transformation. Furthermore, the complexity of lengthy Arabic sentences has contributed to these challenges. This research presents an innovative Arabic question-generation system built upon a three-stage process: keywords and key phrases extraction, question generation, and subsequent ranking. The aim is to tackle the difficulties associated with automatically generating assessment questions in the Arabic language. The proposed approach and results show a precision of 83.50%, a recall of 78.68%, and an Fl score of 80.95%, indicating the framework high efficiency. Human evaluation further confirmed the model efficiency, receiving an average rating of 84%.

machine learning, question answering, question generation, (20 more...)

2406.0852

Country:

Asia > Middle East > Palestine (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
North America > United States (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.55)
Education > Assessment & Standards > Student Performance (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)