AITopics | Ng, Patrick

Collaborating Authors

Ng, Patrick

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CoddLLM: Empowering Large Language Models for Data Analytics

Zhang, Jiani, Zhang, Hengrui, Chakravarti, Rishav, Hu, Yiqun, Ng, Patrick, Katsifodimos, Asterios, Rangwala, Huzefa, Karypis, George, Halevy, Alon

arXiv.org Artificial IntelligenceFeb-1-2025

Large Language Models (LLMs) have the potential to revolutionize data analytics by simplifying tasks such as data discovery and SQL query synthesis through natural language interactions. This work serves as a pivotal first step toward the development of foundation models explicitly designed for data analytics applications. To propel this vision forward, we unveil a new data recipe for post-training LLMs, enhancing their comprehension of data management and empowering them to tackle complex real-world analytics tasks. Specifically, our innovative approach includes a scalable synthetic data generation method that enables the creation of a broad spectrum of topics centered on data representation and manipulation. Furthermore, we introduce two new tasks that seamlessly bridge tables and text. We show that such tasks can enhance models' understanding of schema creation and the nuanced translation between natural language and tabular data. Leveraging this data recipe, we post-train a new foundation model, named CoddLLM, based on Mistral-NeMo-12B. To assess the language understanding and reasoning capabilities of LLMs in the realm of data analytics, we contribute AnalyticsMMLU, a benchmark containing thousands of multiple-choice questions on databases, data analysis, and machine learning. Our focus on data discovery, has resulted in the contribution of three comprehensive benchmarks that address both database and data lake scenarios. CoddLLM not only excels in performance but also sets a new standard, achieving the highest average accuracy across eight datasets. It outperforms GPT-3.5-Turbo on AnalyticsMMLU, exceeding GPT-4o by 12.1% in table selection and showing an average improvement of 24.9% in Text-to-SQL compared to the base model.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.00329

Country:

North America > United States > California > Fresno County (0.14)
North America > United States > Illinois (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Education (1.00)
Leisure & Entertainment > Sports > Olympic Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Better Understanding Table Instruction Tuning: Decoupling the Effects from Data versus Models

Deng, Naihao, Zhang, Sheng, Zhu, Henghui, Chang, Shuaichen, Zhang, Jiani, Li, Alexander Hanbo, Hang, Chung-Wei, Kobayashi, Hideo, Hu, Yiqun, Ng, Patrick

arXiv.org Artificial IntelligenceJan-24-2025

Recent advances in natural language processing have leveraged instruction tuning to enhance Large Language Models (LLMs) for table-related tasks. However, previous works train different base models with different training data, lacking an apples-to-apples comparison across the result table LLMs. To address this, we fine-tune base models from the Mistral, OLMo, and Phi families on existing public training datasets. Our replication achieves performance on par with or surpassing existing table LLMs, establishing new state-of-the-art performance on Hitab, a table question-answering dataset. More importantly, through systematic out-of-domain evaluation, we decouple the contributions of training data and the base model, providing insight into their individual impacts. In addition, we assess the effects of table-specific instruction tuning on general-purpose benchmarks, revealing trade-offs between specialization and generalization.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.14717

Country:

Asia (0.93)
Europe (0.68)
North America > United States > Texas (0.14)
North America > Mexico > Mexico City (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment > Sports (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

PRACTIQ: A Practical Conversational Text-to-SQL dataset with Ambiguous and Unanswerable Queries

Dong, Mingwen, Kumar, Nischal Ashok, Hu, Yiqun, Chauhan, Anuj, Hang, Chung-Wei, Chang, Shuaichen, Pan, Lin, Lan, Wuwei, Zhu, Henghui, Jiang, Jiarong, Ng, Patrick, Wang, Zhiguo

arXiv.org Artificial IntelligenceOct-14-2024

Previous text-to-SQL datasets and systems have primarily focused on user questions with clear intentions that can be answered. However, real user questions can often be ambiguous with multiple interpretations or unanswerable due to a lack of relevant data. In this work, we construct a practical conversational text-to-SQL dataset called PRACTIQ, consisting of ambiguous and unanswerable questions inspired by real-world user questions. We first identified four categories of ambiguous questions and four categories of unanswerable questions by studying existing text-to-SQL datasets. Then, we generate conversations with four turns: the initial user question, an assistant response seeking clarification, the user's clarification, and the assistant's clarified SQL response with the natural language explanation of the execution results. For some ambiguous queries, we also directly generate helpful SQL responses, that consider multiple aspects of ambiguity, instead of requesting user clarification. To benchmark the performance on ambiguous, unanswerable, and answerable questions, we implemented large language model (LLM)-based baselines using various LLMs. Our approach involves two steps: question category classification and clarification SQL prediction. Our experiments reveal that state-of-the-art systems struggle to handle ambiguous and unanswerable questions effectively. We will release our code for data generation and experiments on GitHub.

category, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.11076

Country:

Europe (0.67)
Asia (0.46)
North America > United States (0.28)

Genre:

Research Report (0.82)
Personal (0.67)

Industry:

Automobiles & Trucks > Manufacturer (1.00)
Leisure & Entertainment (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall

Yuan, Jiaqing, Pan, Lin, Hang, Chung-Wei, Guo, Jiang, Jiang, Jiarong, Min, Bonan, Ng, Patrick, Wang, Zhiguo

arXiv.org Artificial IntelligenceApr-24-2024

Large language models (LLMs) have shown remarkable performance on a variety of NLP tasks, and are being rapidly adopted in a wide range of use cases. It is therefore of vital importance to holistically evaluate the factuality of their generated outputs, as hallucinations remain a challenging issue. In this work, we focus on assessing LLMs' ability to recall factual knowledge learned from pretraining, and the factors that affect this ability. To that end, we construct FACT-BENCH, a representative benchmark covering 20 domains, 134 property types, 3 answer types, and different knowledge popularity levels. We benchmark 31 models from 10 model families and provide a holistic assessment of their strengths and weaknesses. We observe that instruction-tuning hurts knowledge recall, as pretraining-only models consistently outperform their instruction-tuned counterparts, and positive effects of model scaling, as larger models outperform smaller ones for all model families. However, the best performance from GPT-4 still represents a large gap with the upper-bound. We additionally study the role of in-context exemplars using counterfactual demonstrations, which lead to significant degradation of factual knowledge recall for large models. By further decoupling model known and unknown knowledge, we find the degradation is attributed to exemplars that contradict a model's known knowledge, as well as the number of such exemplars. Lastly, we fine-tune LLaMA-7B in different settings of known and unknown knowledge. In particular, fine-tuning on a model's known knowledge is beneficial, and consistently outperforms fine-tuning on unknown and mixed knowledge. We will make our benchmark publicly available.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2404.16164

Country:

Asia > Middle East > UAE (0.14)
North America > United States > California (0.14)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Propagation and Pitfalls: Reasoning-based Assessment of Knowledge Editing through Counterfactual Tasks

Hua, Wenyue, Guo, Jiang, Dong, Mingwen, Zhu, Henghui, Ng, Patrick, Wang, Zhiguo

arXiv.org Artificial IntelligenceJan-30-2024

Current approaches of knowledge editing struggle to effectively propagate updates to interconnected facts. In this work, we delve into the barriers that hinder the appropriate propagation of updated knowledge within these models for accurate reasoning. To support our analysis, we introduce a novel reasoning-based benchmark -- ReCoE (Reasoning-based Counterfactual Editing dataset) -- which covers six common reasoning schemes in real world. We conduct a thorough analysis of existing knowledge editing techniques, including input augmentation, finetuning, and locate-and-edit. We found that all model editing methods show notably low performance on this dataset, especially in certain reasoning schemes. Our analysis over the chain-of-thought generation of edited models further uncover key reasons behind the inadequacy of existing knowledge editing methods from a reasoning standpoint, involving aspects on fact-wise editing, fact recall ability, and coherence in generation. We will make our benchmark publicly available.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2401.17585

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Government (0.94)
Leisure & Entertainment > Sports > Hockey (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
(2 more...)

Add feedback

Few-Shot Data-to-Text Generation via Unified Representation and Multi-Source Learning

Li, Alexander Hanbo, Shang, Mingyue, Spiliopoulou, Evangelia, Ma, Jie, Ng, Patrick, Wang, Zhiguo, Min, Bonan, Wang, William, McKeown, Kathleen, Castelli, Vittorio, Roth, Dan, Xiang, Bing

arXiv.org Artificial IntelligenceAug-9-2023

We present a novel approach for structured data-to-text generation that addresses the limitations of existing methods that primarily focus on specific types of structured data. Our proposed method aims to improve performance in multi-task training, zero-shot and few-shot scenarios by providing a unified representation that can handle various forms of structured data such as tables, knowledge graph triples, and meaning representations. We demonstrate that our proposed approach can effectively adapt to new structured forms, and can improve performance in comparison to current methods. For example, our method resulted in a 66% improvement in zero-shot BLEU scores when transferring models trained on table inputs to a knowledge graph dataset. Our proposed method is an important step towards a more general data-to-text generation framework.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2308.05317

Country:

Europe (0.93)
Asia (0.93)
South America (0.69)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment > Sports > Football (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)

Add feedback

UNITE: A Unified Benchmark for Text-to-SQL Evaluation

Lan, Wuwei, Wang, Zhiguo, Chauhan, Anuj, Zhu, Henghui, Li, Alexander, Guo, Jiang, Zhang, Sheng, Hang, Chung-Wei, Lilien, Joseph, Hu, Yiqun, Pan, Lin, Dong, Mingwen, Wang, Jun, Jiang, Jiarong, Ash, Stephen, Castelli, Vittorio, Ng, Patrick, Xiang, Bing

arXiv.org Artificial IntelligenceJul-14-2023

A practical text-to-SQL system should generalize well on a wide variety of natural language questions, unseen database schemas, and novel SQL query structures. To comprehensively evaluate text-to-SQL systems, we introduce a UNIfied benchmark for Text-to-SQL Evaluation (UNITE). It is composed of publicly available text-to-SQL datasets, containing natural language questions from more than 12 domains, SQL queries from more than 3.9K patterns, and 29K databases. Compared to the widely used Spider benchmark, we introduce $\sim$120K additional examples and a threefold increase in SQL patterns, such as comparative and boolean questions. We conduct a systematic study of six state-of-the-art (SOTA) text-to-SQL parsers on our new benchmark and show that: 1) Codex performs surprisingly well on out-of-domain datasets; 2) specially designed decoding methods (e.g. constrained beam search) can improve performance for both in-domain and out-of-domain settings; 3) explicitly modeling the relationship between questions and schemas further improves the Seq2Seq models. More importantly, our benchmark presents key challenges towards compositional generalization and robustness issues -- which these SOTA models cannot address well. Our code and data processing script are available at https://github.com/awslabs/unified-text2sql-benchmark

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.16265

Country:

Europe (0.93)
North America > United States > Louisiana (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback

Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge

Fu, Xingyu, Zhang, Sheng, Kwon, Gukyeong, Perera, Pramuditha, Zhu, Henghui, Zhang, Yuhao, Li, Alexander Hanbo, Wang, William Yang, Wang, Zhiguo, Castelli, Vittorio, Ng, Patrick, Roth, Dan, Xiang, Bing

arXiv.org Artificial IntelligenceMay-30-2023

The open-ended Visual Question Answering (VQA) task requires AI models to jointly reason over visual and natural language inputs using world knowledge. Recently, pre-trained Language Models (PLM) such as GPT-3 have been applied to the task and shown to be powerful world knowledge sources. However, these methods suffer from low knowledge coverage caused by PLM bias -- the tendency to generate certain tokens over other tokens regardless of prompt changes, and high dependency on the PLM quality -- only models using GPT-3 can achieve the best result. To address the aforementioned challenges, we propose RASO: a new VQA pipeline that deploys a generate-then-select strategy guided by world knowledge for the first time. Rather than following the de facto standard to train a multi-modal model that directly generates the VQA answer, RASO first adopts PLM to generate all the possible answers, and then trains a lightweight answer selection model for the correct answer. As proved in our analysis, RASO expands the knowledge coverage from in-domain training data by a large margin. We provide extensive experimentation and show the effectiveness of our pipeline by advancing the state-of-the-art by 4.1% on OK-VQA, without additional computation cost. Code and models are released at http://cogcomp.org/page/publication_view/1010

machine learning, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

2305.18842

Country:

North America > United States (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Benchmarking Diverse-Modal Entity Linking with Generative Models

Wang, Sijia, Li, Alexander Hanbo, Zhu, Henry, Zhang, Sheng, Hang, Chung-Wei, Perera, Pramuditha, Ma, Jie, Wang, William, Wang, Zhiguo, Castelli, Vittorio, Xiang, Bing, Ng, Patrick

arXiv.org Artificial IntelligenceMay-26-2023

Entities can be expressed in diverse formats, such as texts, images, or column names and cell values in tables. While existing entity linking (EL) models work well on per modality configuration, such as text-only EL, visual grounding, or schema linking, it is more challenging to design a unified model for diverse modality configurations. To bring various modality configurations together, we constructed a benchmark for diverse-modal EL (DMEL) from existing EL datasets, covering all three modalities including text, image, and table. To approach the DMEL task, we proposed a generative diverse-modal model (GDMM) following a multimodal-encoder-decoder paradigm. Pre-training \Model with rich corpora builds a solid foundation for DMEL without storing the entire KB for inference. Fine-tuning GDMM builds a stronger DMEL baseline, outperforming state-of-the-art task-specific EL models by 8.51 F1 score on average. Additionally, extensive error analyses are conducted to highlight the challenges of DMEL, facilitating future research on this task.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.17337

Country:

Europe (1.00)
Asia (1.00)
North America > United States > New York > New York County > New York City (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases

Yu, Donghan, Zhang, Sheng, Ng, Patrick, Zhu, Henghui, Li, Alexander Hanbo, Wang, Jun, Hu, Yiqun, Wang, William, Wang, Zhiguo, Xiang, Bing

arXiv.org Artificial IntelligenceApr-14-2023

Question answering over knowledge bases (KBs) aims to answer natural language questions with factual information such as entities and relations in KBs. Previous methods either generate logical forms that can be executed over KBs to obtain final answers or predict answers directly. Empirical results show that the former often produces more accurate answers, but it suffers from non-execution issues due to potential syntactic and semantic errors in the generated logical forms. AF that jointly generates both logical forms and direct answers, and then combines the merits of them to get the final answers. AF is based on simple free-text retrieval without relying on any entity linking tools -- this simplification eases its adaptation to different datasets. AF achieves new stateof-the-art accuracy on WebQSP, FreebaseQA, and GrailQA benchmarks, while getting competitive results on the ComplexWebQuestions benchmark. Knowledge Bases Question Answering (KBQA) aims to answer natural language questions based on knowledge from KBs such as DBpedia (Auer et al., 2007), Freebase (Bollacker et al., 2008) or Wikidata (Vrandečić & Krötzsch, 2014). Existing methods can be divided into two categories. One category is based on semantic parsing, where models first parse the input question into a logical form (e.g., SPARQL (hommeaux, 2011) or S-expression (Gu et al., 2021)) then execute the logical form against knowledge bases to obtain the final answers (Das et al., 2021; Gu et al., 2021; Ye et al., 2022). They either classify the entities in KB to decide which are the answers (Sun et al., 2019) or generate the answers using a sequence-to-sequence framework (Saxena et al., 2022; Oğuz et al., 2022). Previous empirical results (Ye et al., 2022; Das et al., 2021; Gu et al., 2022) show that the semantic parsing based methods can produce more accurate answers over benchmark datasets. However, due to the syntax and semantic restrictions, the output logical forms can often be non-executable and thus would not produce any answers. On the other hand, direct-answer-prediction methods can guarantee to generate output answers, albeit their answer accuracy is usually not as good as semantic parsing based methods, especially over complex questions which require multi-hop reasoning (Talmor & Berant, 2018).

artificial intelligence, ec af, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.00063

Country:

Asia (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.69)
Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback