AITopics | Ding, Bosheng

Collaborating Authors

Ding, Bosheng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs

Chen, Hailin, Jiao, Fangkai, Ravaut, Mathieu, Farruque, Nawshad, Nguyen, Xuan Phi, Qin, Chengwei, Dey, Manan, Ding, Bosheng, Xiong, Caiming, Joty, Shafiq, Zhou, Yingbo

arXiv.org Artificial IntelligenceDec-23-2024

The rapid development of large language models (LLMs) necessitates robust, unbiased, and scalable methods for evaluating their capabilities. However, human annotations are expensive to scale, model-based evaluations are prone to biases in answer style, while target-answer-based benchmarks are vulnerable to data contamination and cheating. To address these limitations, we propose StructTest, a novel benchmark that evaluates LLMs on their ability to produce compositionally specified structured outputs as an unbiased, cheap-to-run and difficult-to-cheat measure. The evaluation is done deterministically by a rule-based evaluator, which can be easily extended to new tasks. By testing structured outputs across diverse task domains -- including Summarization, Code, HTML and Math -- we demonstrate that StructTest serves as a good proxy for general reasoning abilities, as producing structured outputs often requires internal logical reasoning. We believe that StructTest offers a critical, complementary approach to objective and robust model evaluation.

benchmark, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.18011

Country:

North America > United States (1.00)
Asia (0.68)
North America > Canada > Alberta (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?

Qin, Chengwei, Xia, Wenhan, Wang, Tan, Jiao, Fangkai, Hu, Yuchen, Ding, Bosheng, Chen, Ruirui, Joty, Shafiq

arXiv.org Artificial IntelligenceJun-23-2024

Analogical reasoning is a unique ability of humans to address unfamiliar challenges by transferring strategies from relevant past experiences. One key finding in psychology is that compared with irrelevant past experiences, recalling relevant ones can help humans better handle new tasks. Coincidentally, the NLP community has also recently found that self-generating relevant examples in the context can help large language models (LLMs) better solve a given problem than hand-crafted prompts. However, it is yet not clear whether relevance is the key factor eliciting such capability, i.e., can LLMs benefit more from self-generated relevant examples than irrelevant ones? In this work, we systematically explore whether LLMs can truly perform analogical reasoning on a diverse set of reasoning tasks. With extensive experiments and analysis, we show that self-generated random examples can surprisingly achieve comparable or even better performance, e.g., 4% performance boost on GSM8K with random biological examples. We find that the accuracy of self-generated examples is the key factor and subsequently design two improved methods with significantly reduced inference costs. Overall, we aim to advance a deeper understanding of LLM analogical reasoning and hope this work stimulates further research in the design of self-generated contexts.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.12728

Country:

North America > Canada (0.14)
Asia > Middle East > UAE (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

How Much are LLMs Contaminated? A Comprehensive Survey and the LLMSanitize Library

Ravaut, Mathieu, Ding, Bosheng, Jiao, Fangkai, Chen, Hailin, Li, Xingxuan, Zhao, Ruochen, Qin, Chengwei, Xiong, Caiming, Joty, Shafiq

arXiv.org Artificial IntelligenceMar-31-2024

With the rise of Large Language Models (LLMs) in recent years, new opportunities are emerging, but also new challenges, and contamination is quickly becoming critical. Business applications and fundraising in AI have reached a scale at which a few percentage points gained on popular question-answering benchmarks could translate into dozens of millions of dollars, placing high pressure on model integrity. At the same time, it is becoming harder and harder to keep track of the data that LLMs have seen; if not impossible with closed-source models like GPT-4 and Claude-3 not divulging any information on the training set. As a result, contamination becomes a critical issue: LLMs' performance may not be reliable anymore, as the high performance may be at least partly due to their previous exposure to the data. This limitation jeopardizes the entire progress in the field of NLP, yet, there remains a lack of methods on how to efficiently address contamination, or a clear consensus on prevention, mitigation and classification of contamination.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.00699

Country:

North America > United States > Michigan (0.14)
North America > Canada (0.14)

Genre:

Overview (1.00)
Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources

Li, Xingxuan, Zhao, Ruochen, Chia, Yew Ken, Ding, Bosheng, Joty, Shafiq, Poria, Soujanya, Bing, Lidong

arXiv.org Artificial IntelligenceDec-3-2023

It results in more factual rationales and reduced hallucination in generation. Specifically, CoK consists of three stages: reasoning preparation, dynamic knowledge adapting, and answer consolidation. Given a knowledge-intensive question, CoK first prepares several preliminary rationales and answers while identifying the relevant knowledge domains. If there is no majority consensus among the answers from samples, CoK corrects the rationales step by step by adapting knowledge from the identified domains. These corrected rationales can plausibly serve as a better foundation for the final answer consolidation. Unlike prior studies that primarily use unstructured data, CoK also leverages structured knowledge sources such as Wikidata and tables that provide more reliable factual information. To access both unstructured and structured knowledge sources in the dynamic knowledge adapting stage, we propose an adaptive query generator that allows the generation of queries for various types of query languages, including SPARQL, SQL, and natural sentences. Moreover, to minimize error propagation between rationales, CoK corrects the rationales progressively using preceding corrected rationales to generate and correct subsequent rationales. Extensive experiments show that CoK consistently improves the performance of LLMs on knowledge-intensive tasks across different domains. In recent years, large language models (LLMs) such as ChatGPT (OpenAI, 2023) have demonstrated impressive language generation capabilities (Cheng et al., 2023; Ding et al., 2023). However, one major challenge of LLMs lies in hallucination, which is their tendency to confidently generate plausible but factually incorrect texts (Ji et al., 2023). As shown in Figure 1, given a question, "What year was the Argentine actor who directed El Tio Disparate born?" which requires factual knowledge to answer, the most advanced LLMs often provide an incorrect answer. While LLMs have the remarkable capability to recall information from their training data, effectively updating or controlling the factual knowledge within these models remains challenging (Luo et al., 2023). A promising direction to address hallucination in generation is to augment the LLMs with external knowledge (Mialon et al., 2023). These methods involve incorporating LLMs with a retrieval system, which seeks to utilize external factual knowledge to guide the generation process. Instead of relying solely on the internal training knowledge of LLMs, these methods can fetch relevant infor-Equal contribution. Xingxuan Li, Yew Ken Chia, and Bosheng Ding are under the Joint Ph.D. Program between Alibaba and their corresponding universities. We will make our code and data publicly available.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.13269

Country:

Asia > Middle East > Bahrain (0.15)
North America > United States > Massachusetts (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment > Sports > Motorsports (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Retrieving Multimodal Information for Augmented Generation: A Survey

Zhao, Ruochen, Chen, Hailin, Wang, Weishi, Jiao, Fangkai, Do, Xuan Long, Qin, Chengwei, Ding, Bosheng, Guo, Xiaobao, Li, Minzhi, Li, Xingxuan, Joty, Shafiq

arXiv.org Artificial IntelligenceNov-30-2023

As Large Language Models (LLMs) become popular, there emerged an important trend of using multimodality to augment the LLMs' generation ability, which enables LLMs to better interact with the world. However, there lacks a unified perception of at which stage and how to incorporate different modalities. In this survey, we review methods that assist and augment generative models by retrieving multimodal knowledge, whose formats range from images, codes, tables, graphs, to audio. Such methods offer a promising solution to important concerns such as factuality, reasoning, interpretability, and robustness. By providing an in-depth review, this survey is expected to provide scholars with a deeper understanding of the methods' applications and encourage them to adapt existing techniques to the fast-growing field of LLMs.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2303.10868

Country:

Europe (1.00)
Asia > Middle East (0.69)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.34)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Is GPT-3 a Good Data Annotator?

Ding, Bosheng, Qin, Chengwei, Liu, Linlin, Chia, Yew Ken, Joty, Shafiq, Li, Boyang, Bing, Lidong

arXiv.org Artificial IntelligenceJun-14-2023

Evaluations show that GPT-3 has gained The democratization of artificial intelligence (AI) through pretraining a surprisingly wide range of (Garvey, 2018; Rubeis et al., 2022) aims to provide knowledge, which can be transferred to downstream access to AI technologies to all members of tasks through knowledge distillation (Kim society, including individuals, small-and mediumsized et al., 2022). We present some examples in Appendix enterprises (SMEs), academic research labs, A.12. Due to the model architecture and and nonprofit organizations. Achieving this goal is pretraining tasks designed for auto-regressive generation, crucial for the promotion of innovation, economic GPT-3 is capable of generating human-like growth, and fairness and equality. As typical AI text and performing a broad array of NLP tasks, models are usually data-hungry, one significant obstacle such as machine translation, summarization, and of AI democratization is the preparation of question-answering.

gpt-3, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2212.1045

Country:

Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Transportation > Air (0.93)
Media (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback

LogicLLM: Exploring Self-supervised Logic-enhanced Training for Large Language Models

Jiao, Fangkai, Teng, Zhiyang, Joty, Shafiq, Ding, Bosheng, Sun, Aixin, Liu, Zhengyuan, Chen, Nancy F.

arXiv.org Artificial IntelligenceMay-24-2023

Existing efforts to improve logical reasoning ability of language models have predominantly relied on supervised fine-tuning, hindering generalization to new domains and/or tasks. The development of Large Langauge Models (LLMs) has demonstrated the capacity of compressing abundant knowledge into a single proxy, enabling them to tackle multiple tasks effectively. Our preliminary experiments, nevertheless, show that LLMs do not show capability on logical reasoning. The performance of LLMs on logical reasoning benchmarks is far behind the existing state-of-the-art baselines. In this paper, we make the first attempt to investigate the feasibility of incorporating logical knowledge through self-supervised post-training, and activating it via in-context learning, which we termed as LogicLLM. Specifically, we devise an auto-regressive objective variant of MERIt and integrate it with two LLM series, i.e., FLAN-T5 and LLaMA, with parameter size ranging from 3 billion to 13 billion. The results on two challenging logical reasoning benchmarks demonstrate the effectiveness of LogicLLM. Besides, we conduct extensive ablation studies to analyze the key factors in designing logic-oriented proxy tasks.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.13718

Country: Europe (0.28)

Genre:

Research Report (0.82)
Instructional Material (0.50)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Panda LLM: Training Data and Evaluation for Open-Sourced Chinese Instruction-Following Large Language Models

Jiao, Fangkai, Ding, Bosheng, Luo, Tianze, Mo, Zhanfeng

arXiv.org Artificial IntelligenceMay-4-2023

It due to their exceptional versatility in various is also the first released LLM of the Dandelion natural language proccessing tasks such as code Project. Our Panda LLM model has been writing and article editing, making them ubiquitous trained on Chinese-Wiki-2019, Chinese-News-in various industries and significantly enhancing 2016, Chinese-Baike-2018, Chinese-Webtext-2019 people's productivity (Ding et al., 2022; Zhao et al., and Translation-2019 (Xu, 2019) and COIG 2023). However, there are limitations to current datasets (Zhang et al., 2023) with instructiontuning off-the-shelf instruction-following large language (Wei et al., 2021) based on the LLaMA models, including the lack of trustworthiness in model (Touvron et al., 2023). Anticipated future releases generated results, lack of transparency in the model include progressively larger models such as used which raises concerns about data security, and Panda-13B and Panda-33B, with expected release the unknown training recipe, making it difficult to dates in the near future. Equal contribution, order decided by coin flip.

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2305.03025

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Can ChatGPT-like Generative Models Guarantee Factual Accuracy? On the Mistakes of New Generation Search Engines

Zhao, Ruochen, Li, Xingxuan, Chia, Yew Ken, Ding, Bosheng, Bing, Lidong

arXiv.org Artificial IntelligenceMar-2-2023

Although large conversational AI models such as OpenAI's ChatGPT have demonstrated great potential, we question whether such models can guarantee factual accuracy. Recently, technology companies such as Microsoft and Google have announced new services which aim to combine search engines with conversational AI. However, we have found numerous mistakes in the public demonstrations that suggest we should not easily trust the factual claims of the AI models. Rather than criticizing specific models or companies, we hope to call on researchers and developers to improve AI models' transparency and factual correctness.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.11076

Country: North America > United States (0.47)

Genre: Research Report (0.50)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks

Ding, Bosheng, Liu, Linlin, Bing, Lidong, Kruengkrai, Canasai, Nguyen, Thien Hai, Joty, Shafiq, Si, Luo, Miao, Chunyan

arXiv.org Artificial IntelligenceNov-3-2020

Data augmentation techniques have been widely used to improve machine learning performance as they enhance the generalization capability of models. In this work, to generate high quality synthetic data for low-resource tagging tasks, we propose a novel augmentation method with language models trained on the linearized labeled sentences. Our method is applicable to both supervised and semi-supervised settings. For the supervised settings, we conduct extensive experiments on named entity recognition (NER), part of speech (POS) tagging and end-to-end target based sentiment analysis (E2E-TBSA) tasks. For the semi-supervised settings, we evaluate our method on the NER task under the conditions of given unlabeled data only and unlabeled data plus a knowledge base. The results show that our method can consistently outperform the baselines, particularly when the given gold training data are less.

computational linguistics, deep learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

2011.01549

Country:

Europe (1.00)
North America > United States > Colorado (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback