AITopics | Su, Hui

Collaborating Authors

Su, Hui

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning

Chen, Xinghao, Sun, Zhijing, Guo, Wenjin, Zhang, Miaoran, Chen, Yanjun, Sun, Yirong, Su, Hui, Pan, Yijie, Klakow, Dietrich, Li, Wenjie, Shen, Xiaoyu

arXiv.org Artificial IntelligenceFeb-25-2025

Large Language Models (LLMs) excel in reasoning tasks through Chain-of-Thought (CoT) prompting. However, CoT prompting greatly increases computational demands, which has prompted growing interest in distilling CoT capabilities into Small Language Models (SLMs). This study systematically examines the factors influencing CoT distillation, including the choice of granularity, format and teacher model. Through experiments involving four teacher models and seven student models across seven mathematical and commonsense reasoning datasets, we uncover three key findings: (1) Unlike LLMs, SLMs exhibit a non-monotonic relationship with granularity, with stronger models benefiting from finer-grained reasoning and weaker models performing better with simpler CoT supervision; (2) CoT format significantly impacts LLMs but has minimal effect on SLMs, likely due to their reliance on supervised fine-tuning rather than pretraining preferences; (3) Stronger teacher models do NOT always produce better student models, as diversity and complexity in CoT supervision can outweigh accuracy alone. These findings emphasize the need to tailor CoT strategies to specific student model, offering actionable insights for optimizing CoT distillation in SLMs. The code and datasets are available at https://github.com/EIT-NLP/Distilling-CoT-Reasoning.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.18001

Country:

Asia (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual Errors

Chandler, Alex, Surve, Devesh, Su, Hui

arXiv.org Artificial IntelligenceJun-18-2024

Accurate text summarization is one of the most common and important tasks performed by Large Language Models, where the costs of human review for an entire document may be high, but the costs of errors in summarization may be even greater. We propose Detecting Errors through Ensembling Prompts (DEEP) - an end-to-end large language model framework for detecting factual errors in text summarization. Our framework uses a diverse set of LLM prompts to identify factual inconsistencies, treating their outputs as binary features, which are then fed into ensembling models. We then calibrate the ensembled models to produce empirically accurate probabilities that a text is factually consistent or free of hallucination. We demonstrate that prior models for detecting factual errors in summaries perform significantly worse without optimizing the thresholds on subsets of the evaluated dataset. Our framework achieves state-of-the-art (SOTA) balanced accuracy on the AggreFact-XSUM FTSOTA, TofuEval Summary-Level, and HaluEval Summarization benchmarks in detecting factual errors within transformer-generated text summaries. It does so without any fine-tuning of the language model or reliance on thresholding techniques not available in practical settings.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2406.13009

Country:

Europe (0.67)
North America > United States > Texas (0.14)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Add feedback

Unraveling the Mystery of Scaling Laws: Part I

Su, Hui, Tian, Zhi, Shen, Xiaoyu, Cai, Xunliang

arXiv.org Artificial IntelligenceApr-5-2024

Scaling law principles indicate a power-law correlation between loss and variables such as model size, dataset size, and computational resources utilized during training. These principles play a vital role in optimizing various aspects of model pre-training, ultimately contributing to the success of large language models such as GPT-4, Llama and Gemini. However, the original scaling law paper by OpenAI did not disclose the complete details necessary to derive the precise scaling law formulas, and their conclusions are only based on models containing up to 1.5 billion parameters. Though some subsequent works attempt to unveil these details and scale to larger models, they often neglect the training dependency of important factors such as the learning rate, context length and batch size, leading to their failure to establish a reliable formula for predicting the test loss trajectory. In this technical report, we confirm that the scaling law formulations proposed in the original OpenAI paper remain valid when scaling the model size up to 33 billion, but the constant coefficients in these formulas vary significantly with the experiment setup. We meticulously identify influential factors and provide transparent, step-by-step instructions to estimate all constant terms in scaling-law formulas by training on models with only 1M~60M parameters. Using these estimated formulas, we showcase the capability to accurately predict various attributes for models with up to 33B parameters before their training, including (1) the minimum possible test loss; (2) the minimum required training steps and processed tokens to achieve a specific loss; (3) the critical batch size with an optimal time/computation trade-off at any loss value; and (4) the complete test loss trajectory with arbitrary batch size.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2403.06563

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)

Add feedback

WeLM: A Well-Read Pre-trained Language Model for Chinese

Su, Hui, Zhou, Xiao, Yu, Houjin, Shen, Xiaoyu, Chen, Yuwen, Zhu, Zilin, Yu, Yang, Zhou, Jie

arXiv.org Artificial IntelligenceMay-15-2023

Large Language Models pre-trained with self-supervised learning have demonstrated impressive zero-shot generalization capabilities on a wide spectrum of tasks. In this work, we present WeLM: a well-read pre-trained language model for Chinese that is able to seamlessly perform different types of tasks with zero or few-shot demonstrations. WeLM is trained with 10B parameters by "reading" a curated high-quality corpus covering a wide range of topics. We show that WeLM is equipped with broad knowledge on various domains and languages. On 18 monolingual (Chinese) tasks, WeLM can significantly outperform existing pre-trained models with similar sizes and match the performance of models up to 25 times larger. WeLM also exhibits strong capabilities in multi-lingual and code-switching understanding, outperforming existing multilingual language models pre-trained on 30 languages. Furthermore, We collected human-written prompts for a large set of supervised datasets in Chinese and fine-tuned WeLM with multi-prompted training. The resulting model can attain strong generalization on unseen types of tasks and outperform the unsupervised WeLM in zero-shot learning. Finally, we demonstrate that WeLM has basic skills at explaining and calibrating the decisions from itself, which can be promising directions for future research. Our models can be applied from https://welm.weixin.qq.com/docs/api/.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2209.10372

Country:

Europe (0.67)
Asia (0.46)
North America > United States (0.28)

Genre:

Research Report (0.50)
Overview (0.46)

Industry:

Education (0.46)
Leisure & Entertainment (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Narrative Question Answering with Cutting-Edge Open-Domain QA Techniques: A Comprehensive Study

Mou, Xiangyang, Yang, Chenghao, Yu, Mo, Yao, Bingsheng, Guo, Xiaoxiao, Potdar, Saloni, Su, Hui

arXiv.org Artificial IntelligenceJun-7-2021

Recent advancements in open-domain question answering (ODQA), i.e., finding answers from large open-domain corpus like Wikipedia, have led to human-level performance on many datasets. However, progress in QA over book stories (Book QA) lags behind despite its similar task formulation to ODQA. This work provides a comprehensive and quantitative analysis about the difficulty of Book QA: (1) We benchmark the research on the NarrativeQA dataset with extensive experiments with cutting-edge ODQA techniques. This quantifies the challenges Book QA poses, as well as advances the published state-of-the-art with a $\sim$7\% absolute improvement on Rouge-L. (2) We further analyze the detailed challenges in Book QA through human studies.\footnote{\url{https://github.com/gorov/BookQA}.} Our findings indicate that the event-centric questions dominate this task, which exemplifies the inability of existing QA models to handle event-oriented scenarios.

artificial intelligence, natural language, ranker, (19 more...)

arXiv.org Artificial Intelligence

2106.03826

Country: Europe > Spain (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)

Add feedback

Improving Multi-turn Dialogue Modelling with Utterance ReWriter

Su, Hui, Shen, Xiaoyu, Zhang, Rongzhi, Sun, Fei, Hu, Pengwei, Niu, Cheng, Zhou, Jie

arXiv.org Artificial IntelligenceJun-14-2019

Recent research has made impressive progress in single-turn dialogue modelling. In the multi-turn setting, however, current models are still far from satisfactory. One major challenge is the frequently occurred coreference and information omission in our daily conversation, making it hard for machines to understand the real intention. In this paper, we propose rewriting the human utterance as a pre-process to help multi-turn dialgoue modelling. Each utterance is first rewritten to recover all coreferred and omitted information. The next processing steps are then performed based on the rewritten utterance. To properly train the utterance rewriter, we collect a new dataset with human annotations and introduce a Transformer-based utterance rewriting architecture using the pointer network. We show the proposed architecture achieves remarkably good performance on the utterance rewriting task. The trained utterance rewriter can be easily integrated into online chatbots and brings general improvement over different domains.

deep learning, neural network, utterance, (18 more...)

arXiv.org Artificial Intelligence

1906.07004

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Toward Cognitive and Immersive Systems: Experiments in a Cognitive Microworld

Peveler, Matthew, Govindarajulu, Naveen Sundar, Bringsjord, Selmer, Srivastava, Biplav, Talamadupula, Kartik, Su, Hui

arXiv.org Artificial IntelligenceDec-18-2018

As computational power has continued to increase, and sensors have become more accurate, the corresponding advent of systems that are at once cognitive and immersive has arrived. These \textit{cognitive and immersive systems} (CAISs) fall squarely into the intersection of AI with HCI/HRI: such systems interact with and assist the human agents that enter them, in no small part because such systems are infused with AI able to understand and reason about these humans and their knowledge, beliefs, goals, communications, plans, etc. We herein explain our approach to engineering CAISs. We emphasize the capacity of a CAIS to develop and reason over a `theory of the mind' of its human partners. This capacity entails that the AI in question has a sophisticated model of the beliefs, knowledge, goals, desires, emotions, etc.\ of these humans. To accomplish this engineering, a formal framework of very high expressivity is needed. In our case, this framework is a \textit{cognitive event calculus}, a particular kind of quantified multi-operator modal logic, and a matching high-expressivity automated reasoner and planner. To explain, advance, and to a degree validate our approach, we show that a calculus of this type satisfies a set of formal requirements, and can enable a CAIS to understand a psychologically tricky scenario couched in what we call the \textit{cognitive polysolid framework} (CPF). We also formally show that a room that satisfies these requirements can have a useful property we term \emph{expectation of usefulness}. CPF, a sub-class of \textit{cognitive microworlds}, includes machinery able to represent and plan over not merely blocks and actions (such as seen in the primitive `blocks worlds' of old), but also over agents and their mental attitudes about both other agents and inanimate objects.

bringsjord, logic programming, neural network, (21 more...)

arXiv.org Artificial Intelligence

1709.05958

Country:

Europe (1.00)
North America > United States > California > Santa Clara County > Palo Alto (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Industry: Information Technology (0.95)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
(3 more...)

Add feedback

NEXUS Network: Connecting the Preceding and the Following in Dialogue Generation

Su, Hui, Shen, Xiaoyu, Li, Wenjie, Klakow, Dietrich

arXiv.org Artificial IntelligenceOct-7-2018

Sequence-to-Sequence (seq2seq) models have become overwhelmingly popular in building end-to-end trainable dialogue systems. Though highly efficient in learning the backbone of human-computer communications, they suffer from the problem of strongly favoring short generic responses. In this paper, we argue that a good response should smoothly connect both the preceding dialogue history and the following conversations. We strengthen this connection through mutual information maximization. To sidestep the non-differentiability of discrete natural language tokens, we introduce an auxiliary continuous code space and map such code space to a learnable prior distribution for generation purpose. Experiments on two dialogue datasets validate the effectiveness of our model, where the generated responses are closely related to the dialogue context and lead to more interactive conversations.

deep learning, information, neural network, (20 more...)

arXiv.org Artificial Intelligence

1810.00671

Country:

Europe > Sweden (0.14)
Europe > Germany (0.14)
Asia > China (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Media > Film (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)

Add feedback

A Cost-Effective Framework for Preference Elicitation and Aggregation

Zhao, Zhibing, Li, Haoming, Wang, Junming, Kephart, Jeffrey, Mattei, Nicholas, Su, Hui, Xia, Lirong

arXiv.org Artificial IntelligenceMay-14-2018

We propose a cost-effective framework for preference elicitation and aggregation under Plackett-Luce model with features. Given a budget, our framework iteratively computes the most cost-effective elicitation questions in order to help the agents make better group decisions. We illustrate the viability of the framework with an experiment on Amazon Mechanical Turk, which estimates the cost of answering different types of elicitation questions. We compare the prediction accuracy of our framework when adopting various information criteria that evaluate the expected information gain from a question. Our experiments show carefully designed information criteria are much more efficient than randomly asking questions given budget constraint.

agent, artificial intelligence, health & medicine, (15 more...)

arXiv.org Artificial Intelligence

1805.05287

Country:

North America > United States > Oregon (0.14)
Europe > Spain > Catalonia (0.14)

Genre: Research Report (0.83)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

Towards Better Variational Encoder-Decoders in Seq2Seq Tasks

Shen, Xiaoyu (Max Planck Institute Informatics) | Su, Hui (Software Institute, University of Chinese Academy of Science, China)

AAAI ConferencesFeb-8-2018

Variational encoder-decoders have shown promising results in seq2seq tasks. However, the training process is known difficult to be controlled because latent variables tend to be ignored while decoding. In this paper, we thoroughly analyze the reason behind this training difficulty, compare different ways of alleviating it and propose a new framework that helps significantly improve the overall performance.

artificial intelligence, machine learning, variational encoder-decoder, (18 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Europe > Germany > Saarland (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback