AITopics | Pan, Lin

Collaborating Authors

Pan, Lin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PRACTIQ: A Practical Conversational Text-to-SQL dataset with Ambiguous and Unanswerable Queries

Dong, Mingwen, Kumar, Nischal Ashok, Hu, Yiqun, Chauhan, Anuj, Hang, Chung-Wei, Chang, Shuaichen, Pan, Lin, Lan, Wuwei, Zhu, Henghui, Jiang, Jiarong, Ng, Patrick, Wang, Zhiguo

arXiv.org Artificial IntelligenceOct-14-2024

Previous text-to-SQL datasets and systems have primarily focused on user questions with clear intentions that can be answered. However, real user questions can often be ambiguous with multiple interpretations or unanswerable due to a lack of relevant data. In this work, we construct a practical conversational text-to-SQL dataset called PRACTIQ, consisting of ambiguous and unanswerable questions inspired by real-world user questions. We first identified four categories of ambiguous questions and four categories of unanswerable questions by studying existing text-to-SQL datasets. Then, we generate conversations with four turns: the initial user question, an assistant response seeking clarification, the user's clarification, and the assistant's clarified SQL response with the natural language explanation of the execution results. For some ambiguous queries, we also directly generate helpful SQL responses, that consider multiple aspects of ambiguity, instead of requesting user clarification. To benchmark the performance on ambiguous, unanswerable, and answerable questions, we implemented large language model (LLM)-based baselines using various LLMs. Our approach involves two steps: question category classification and clarification SQL prediction. Our experiments reveal that state-of-the-art systems struggle to handle ambiguous and unanswerable questions effectively. We will release our code for data generation and experiments on GitHub.

category, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.11076

Country:

Europe (0.67)
Asia (0.46)
North America > United States (0.28)

Genre:

Research Report (0.82)
Personal (0.67)

Industry:

Automobiles & Trucks > Manufacturer (1.00)
Leisure & Entertainment (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall

Yuan, Jiaqing, Pan, Lin, Hang, Chung-Wei, Guo, Jiang, Jiang, Jiarong, Min, Bonan, Ng, Patrick, Wang, Zhiguo

arXiv.org Artificial IntelligenceApr-24-2024

Large language models (LLMs) have shown remarkable performance on a variety of NLP tasks, and are being rapidly adopted in a wide range of use cases. It is therefore of vital importance to holistically evaluate the factuality of their generated outputs, as hallucinations remain a challenging issue. In this work, we focus on assessing LLMs' ability to recall factual knowledge learned from pretraining, and the factors that affect this ability. To that end, we construct FACT-BENCH, a representative benchmark covering 20 domains, 134 property types, 3 answer types, and different knowledge popularity levels. We benchmark 31 models from 10 model families and provide a holistic assessment of their strengths and weaknesses. We observe that instruction-tuning hurts knowledge recall, as pretraining-only models consistently outperform their instruction-tuned counterparts, and positive effects of model scaling, as larger models outperform smaller ones for all model families. However, the best performance from GPT-4 still represents a large gap with the upper-bound. We additionally study the role of in-context exemplars using counterfactual demonstrations, which lead to significant degradation of factual knowledge recall for large models. By further decoupling model known and unknown knowledge, we find the degradation is attributed to exemplars that contradict a model's known knowledge, as well as the number of such exemplars. Lastly, we fine-tune LLaMA-7B in different settings of known and unknown knowledge. In particular, fine-tuning on a model's known knowledge is beneficial, and consistently outperforms fine-tuning on unknown and mixed knowledge. We will make our benchmark publicly available.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2404.16164

Country:

Asia > Middle East > UAE (0.14)
North America > United States > California (0.14)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Curvilinear object segmentation in medical images based on ODoS filter and deep learning network

Peng, Yuanyuan, Pan, Lin, Luan, Pengpeng, Tu, Hongbin, Li, Xiong

arXiv.org Artificial IntelligenceDec-2-2023

Automatic segmentation of curvilinear objects in medical images plays an important role in the diagnosis and evaluation of human diseases, yet it is a challenging uncertainty in the complex segmentation tasks due to different issues such as various image appearances, low contrast between curvilinear objects and their surrounding backgrounds, thin and uneven curvilinear structures, and improper background illumination conditions. To overcome these challenges, we present a unique curvilinear structure segmentation framework based on an oriented derivative of stick (ODoS) filter and a deep learning network for curvilinear object segmentation in medical images. Currently, a large number of deep learning models emphasize developing deep architectures and ignore capturing the structural features of curvilinear objects, which may lead to unsatisfactory results. Consequently, a new approach that incorporates an ODoS filter as part of a deep learning network is presented to improve the spatial attention of curvilinear objects. Specifically, the input image is transfered into four-channel image constructed by the ODoS filter. In which, the original image is considered the principal part to describe various image appearance and complex background illumination conditions, a multi-step strategy is used to enhance the contrast between curvilinear objects and their surrounding backgrounds, and a vector field is applied to discriminate thin and uneven curvilinear structures. Subsequently, a deep learning framework is employed to extract various structural features for curvilinear object segmentation in medical images. The performance of the computational model is validated in experiments conducted on the publicly available DRIVE, STARE and CHASEDB1 datasets. The experimental results indicate that the presented model yields surprising results compared with those of some state-of-the-art methods.

artificial intelligence, curvilinear, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10489-023-04773-4

2301.07475

Country:

Asia > China > Jiangxi Province (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > Promising Solution (0.35)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

UNITE: A Unified Benchmark for Text-to-SQL Evaluation

Lan, Wuwei, Wang, Zhiguo, Chauhan, Anuj, Zhu, Henghui, Li, Alexander, Guo, Jiang, Zhang, Sheng, Hang, Chung-Wei, Lilien, Joseph, Hu, Yiqun, Pan, Lin, Dong, Mingwen, Wang, Jun, Jiang, Jiarong, Ash, Stephen, Castelli, Vittorio, Ng, Patrick, Xiang, Bing

arXiv.org Artificial IntelligenceJul-14-2023

A practical text-to-SQL system should generalize well on a wide variety of natural language questions, unseen database schemas, and novel SQL query structures. To comprehensively evaluate text-to-SQL systems, we introduce a UNIfied benchmark for Text-to-SQL Evaluation (UNITE). It is composed of publicly available text-to-SQL datasets, containing natural language questions from more than 12 domains, SQL queries from more than 3.9K patterns, and 29K databases. Compared to the widely used Spider benchmark, we introduce $\sim$120K additional examples and a threefold increase in SQL patterns, such as comparative and boolean questions. We conduct a systematic study of six state-of-the-art (SOTA) text-to-SQL parsers on our new benchmark and show that: 1) Codex performs surprisingly well on out-of-domain datasets; 2) specially designed decoding methods (e.g. constrained beam search) can improve performance for both in-domain and out-of-domain settings; 3) explicitly modeling the relationship between questions and schemas further improves the Seq2Seq models. More importantly, our benchmark presents key challenges towards compositional generalization and robustness issues -- which these SOTA models cannot address well. Our code and data processing script are available at https://github.com/awslabs/unified-text2sql-benchmark

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.16265

Country:

Europe (0.93)
North America > United States > Louisiana (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback

Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness

Chang, Shuaichen, Wang, Jun, Dong, Mingwen, Pan, Lin, Zhu, Henghui, Li, Alexander Hanbo, Lan, Wuwei, Zhang, Sheng, Jiang, Jiarong, Lilien, Joseph, Ash, Steve, Wang, William Yang, Wang, Zhiguo, Castelli, Vittorio, Ng, Patrick, Xiang, Bing

arXiv.org Artificial IntelligenceJan-28-2023

Neural text-to-SQL models have achieved remarkable performance in translating natural language questions into SQL queries. However, recent studies reveal that text-to-SQL models are vulnerable to task-specific perturbations. Previous curated robustness test sets usually focus on individual phenomena. In this paper, we propose a comprehensive robustness benchmark based on Spider, a cross-domain text-to-SQL benchmark, to diagnose the model robustness. We design 17 perturbations on databases, natural language questions, and SQL queries to measure the robustness from different angles. In order to collect more diversified natural question perturbations, we utilize large pretrained language models (PLMs) to simulate human behaviors in creating natural questions. We conduct a diagnostic study of the state-of-the-art models on the robustness set. Experimental results reveal that even the most robust model suffers from a 14.0% performance drop overall and a 50.7% performance drop on the most challenging perturbation. We also present a breakdown analysis regarding text-to-SQL model designs and provide insights for improving model robustness.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2301.08881

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.46)
Consumer Products & Services (0.46)
Transportation > Air (0.46)
Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)

Add feedback

Importance of Synthesizing High-quality Data for Text-to-SQL Parsing

Zhao, Yiyun, Jiang, Jiarong, Hu, Yiqun, Lan, Wuwei, Zhu, Henry, Chauhan, Anuj, Li, Alexander, Pan, Lin, Wang, Jun, Hang, Chung-Wei, Zhang, Sheng, Dong, Marvin, Lilien, Joe, Ng, Patrick, Wang, Zhiguo, Castelli, Vittorio, Xiang, Bing

arXiv.org Artificial IntelligenceDec-16-2022

Recently, there has been increasing interest in synthesizing data to improve downstream text-to-SQL tasks. In this paper, we first examined the existing synthesized datasets and discovered that state-of-the-art text-to-SQL algorithms did not further improve on popular benchmarks when trained with augmented synthetic data. We observed two shortcomings: illogical synthetic SQL queries from independent column sampling and arbitrary table joins. To address these issues, we propose a novel synthesis framework that incorporates key relationships from schema, imposes strong typing, and conducts schema-distance-weighted column sampling. We also adopt an intermediate representation (IR) for the SQL-to-text task to further improve the quality of the generated natural language questions. When existing powerful semantic parsers are pre-finetuned on our high-quality synthesized data, our experiments show that these models have significant accuracy boosts on popular benchmarks, including new state-of-the-art performance on Spider.

artificial intelligence, computational linguistic, natural language, (14 more...)

arXiv.org Artificial Intelligence

2212.08785

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Span Selection Pre-training for Question Answering

Glass, Michael, Gliozzo, Alfio, Chakravarti, Rishav, Ferritto, Anthony, Pan, Lin, Bhargav, G P Shrivatsa, Garg, Dinesh, Sil, Avirup

arXiv.org Artificial IntelligenceSep-9-2019

BERT (Bidirectional Encoder Representations from Transformers) and related pre-trained Transformers have provided large gains across many language understanding tasks, achieving a new state-of-the-art (SOTA). BERT is pre-trained on two auxiliary tasks: Masked Language Model and Next Sentence Prediction. In this paper we introduce a new pre-training task inspired by reading comprehension and an effort to avoid encoding general knowledge in the transformer network itself. We find significant and consistent improvements over both BERT-BASE and BERT-LARGE on multiple reading comprehension (MRC) and paraphrasing datasets. Specifically, our proposed model has strong empirical evidence as it obtains SOTA results on Natural Questions, a new benchmark MRC dataset, outperforming BERT-LARGE by 3 F1 points on short answer prediction. We also establish a new SOTA in HotpotQA, improving answer prediction F1 by 4 F1 points and supporting fact prediction by 1 F1 point. Moreover, we show that our pre-training approach is particularly effective when training data is limited, improving the learning curve by a large amount.

span selection, survey article, us government, (19 more...)

arXiv.org Artificial Intelligence

1909.0412

Country:

North America > United States > Florida > Brevard County (0.14)
North America > United States > California > San Bernardino County (0.14)

Genre:

Overview (0.93)
Research Report (0.65)

Industry:

Government > Regional Government > North America Government > United States Government (0.93)
Education (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.54)

Add feedback