AITopics | Lu, Xinyuan

Collaborating Authors

Lu, Xinyuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SCITAT: A Question Answering Benchmark for Scientific Tables and Text Covering Diverse Reasoning Types

Zhang, Xuanliang, Wang, Dingzirui, Wang, Baoxin, Dou, Longxu, Lu, Xinyuan, Xu, Keyan, Wu, Dayong, Zhu, Qingfu, Che, Wanxiang

arXiv.org Artificial IntelligenceDec-16-2024

Scientific question answering (SQA) is an important task aimed at answering questions based on papers. However, current SQA datasets have limited reasoning types and neglect the relevance between tables and text, creating a significant gap with real scenarios. To address these challenges, we propose a QA benchmark for scientific tables and text with diverse reasoning types (SciTaT). To cover more reasoning types, we summarize various reasoning types from real-world questions. To involve both tables and text, we require the questions to incorporate tables and text as much as possible. Based on SciTaT, we propose a strong baseline (CaR), which combines various reasoning methods to address different reasoning types and process tables and text at the same time. CaR brings average improvements of 12.9% over other baselines on SciTaT, validating its effectiveness. Error analysis reveals the challenges of SciTaT, such as complex numerical calculations and domain knowledge.

large language model, machine learning, reasoning type, (21 more...)

arXiv.org Artificial Intelligence

2412.11757

Country:

North America > United States (0.28)
Asia > China (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.85)
(2 more...)

Add feedback

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

Ma, Yubo, Zang, Yuhang, Chen, Liangyu, Chen, Meiqi, Jiao, Yizhu, Li, Xinze, Lu, Xinyuan, Liu, Ziyu, Ma, Yan, Dong, Xiaoyi, Zhang, Pan, Pan, Liangming, Jiang, Yu-Gang, Wang, Jiaqi, Cao, Yixin, Sun, Aixin

arXiv.org Artificial IntelligenceJul-10-2024

Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities on long-context DU remain an open problem. This work presents MMLongBench-Doc, a long-context, multi-modal benchmark comprising 1,062 expert-annotated questions. Distinct from previous datasets, it is constructed upon 130 lengthy PDF-formatted documents with an average of 49.4 pages and 20,971 textual tokens. Towards comprehensive evaluation, answers to these questions rely on pieces of evidence from (1) different sources (text, image, chart, table, and layout structure) and (2) various locations (i.e. page number). Moreover, 33.2% of the questions are cross-page questions requiring evidence across multiple pages. 22.8% of the questions are designed to be unanswerable for detecting potential hallucinations. Experiments on 14 LVLMs demonstrate that long-context DU greatly challenges current models. Notably, the best-performing model, GPT-4o, achieves an F1 score of only 42.7%, while the second-best, GPT-4V, scores 31.4%. Furthermore, 12 LVLMs (all except GPT-4o and GPT-4V) even present worse performance than their LLM counterparts which are fed with lossy-parsed OCR documents. These results validate the necessity of future research toward more capable long-context LVLMs. Project Page: https://mayubo2333.github.io/MMLongBench-Doc

large language model, lvlm, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2407.01523

Country:

North America > United States (1.00)
Asia (1.00)
Europe (0.67)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.93)
Law (0.68)
Health & Medicine (0.67)
Education > Educational Setting (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables

Lu, Xinyuan, Pan, Liangming, Liu, Qian, Nakov, Preslav, Kan, Min-Yen

arXiv.org Artificial IntelligenceOct-23-2023

Current scientific fact-checking benchmarks exhibit several shortcomings, such as biases arising from crowd-sourced claims and an over-reliance on text-based evidence. We present SCITAB, a challenging evaluation dataset consisting of 1.2K expert-verified scientific claims that 1) originate from authentic scientific publications and 2) require compositional reasoning for verification. The claims are paired with evidence-containing scientific tables annotated with labels. Through extensive evaluations, we demonstrate that SCITAB poses a significant challenge to state-of-the-art models, including table-based pretraining models and large language models. All models except GPT-4 achieved performance barely above random guessing. Popular prompting techniques, such as Chain-of-Thought, do not achieve much performance gains on SCITAB. Our analysis uncovers several unique challenges posed by SCITAB, including table grounding, claim ambiguity, and compositional reasoning. Our codes and data are publicly available at https://github.com/XinyuanLu00/SciTab.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2305.13186

Country:

Asia (0.14)
North America > United States > California (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry: Education (0.67)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

QACHECK: A Demonstration System for Question-Guided Multi-Hop Fact-Checking

Pan, Liangming, Lu, Xinyuan, Kan, Min-Yen, Nakov, Preslav

arXiv.org Artificial IntelligenceOct-11-2023

Fact-checking real-world claims often requires complex, multi-step reasoning due to the absence of direct evidence to support or refute them. However, existing fact-checking systems often lack transparency in their decision-making, making it challenging for users to comprehend their reasoning process. To address this, we propose the Question-guided Multi-hop Fact-Checking (QACHECK) system, which guides the model's reasoning process by asking a series of questions critical for verifying a claim. QACHECK has five key modules: a claim verifier, a question generator, a question-answering module, a QA validator, and a reasoner. Users can input a claim into QACHECK, which then predicts its veracity and provides a comprehensive report detailing its reasoning process, guided by a sequence of (question, answer) pairs. QACHECK also provides the source of evidence supporting each question, fostering a transparent, explainable, and user-friendly fact-checking process. A recorded video of QACHECK is at https://www.youtube.com/watch?v=ju8kxSldM64

artificial intelligence, demonstration system, question-guided multi-hop fact-checking, (1 more...)

arXiv.org Artificial Intelligence

2310.07609

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Fact-Checking Complex Claims with Program-Guided Reasoning

Pan, Liangming, Wu, Xiaobao, Lu, Xinyuan, Luu, Anh Tuan, Wang, William Yang, Kan, Min-Yen, Nakov, Preslav

arXiv.org Artificial IntelligenceMay-22-2023

Fact-checking real-world claims often requires collecting multiple pieces of evidence and applying complex multi-step reasoning. In this paper, we present Program-Guided Fact-Checking (ProgramFC), a novel fact-checking model that decomposes complex claims into simpler sub-tasks that can be solved using a shared library of specialized functions. We first leverage the in-context learning ability of large language models to generate reasoning programs to guide the verification process. Afterward, we execute the program by delegating each sub-task to the corresponding sub-task handler. This process makes our model both explanatory and data-efficient, providing clear explanations of its reasoning process and requiring minimal training data. We evaluate ProgramFC on two challenging fact-checking datasets and show that it outperforms seven fact-checking baselines across different settings of evidence availability, with explicit output programs that benefit human debugging. Our codes and data are publicly available at https://github.com/mbzuai-nlp/ProgramFC.

def program, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.12744

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.67)

Genre:

Workflow (0.93)
Research Report (0.82)

Industry:

Health & Medicine (0.93)
Leisure & Entertainment > Sports > Motorsports (0.93)
Media > Film (0.68)
Leisure & Entertainment > Sports > Hockey (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.67)

Add feedback

Learning to Generate Questions with Adaptive Copying Neural Networks

Lu, Xinyuan, Guo, Yuhong

arXiv.org Machine LearningSep-17-2019

Automatic question generation is an important problem in natural language processing. In this paper we propose a novel adaptive copying recurrent neural network model to tackle the problem of question generation from sentences and paragraphs. The proposed model adds a copying mechanism component onto a bidirectional LSTM architecture to generate more suitable questions adaptively from the input data. Our experimental results show the proposed model can outperform the state-of- the-art question generation methods in terms of BLEU and ROUGE evaluation scores.

deep learning, neural network, question generation, (19 more...)

arXiv.org Machine Learning

1909.08187

Country: North America > Canada (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback