AITopics | Zhou, Bowen

Plotting

Zhou, Bowen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model

Zhang, Kaiyan, Ding, Ning, Qi, Biqing, Zhu, Xuekai, Long, Xinwei, Zhou, Bowen

arXiv.org Artificial IntelligenceOct-23-2023

Instruction tuning has recently been recognized as an effective way of aligning Large Language Models (LLMs) to enhance their generalization ability across various tasks. However, when tuning publicly accessible, centralized LLMs with private instruction data, privacy concerns are inevitable. While direct transfer of parameterized modules between models is a plausible approach to address this, its implications and effectiveness need further exploration. This paper focuses on Offsite-Tuning (OFT), a representative technique that transfers transformer blocks between centralized LLMs and downstream emulators. Given the limited understanding of the underlying mechanism of OFT, we perform an empirical analysis on LLMs from the perspectives of representation and functional similarity. Interestingly, our findings reveal a unique modular structure within the layers of LLMs that appears to emerge as the model size expands. Simultaneously, we note subtle but potentially significant changes in representation and intermediate predictions across the layers. Inspired by these observations, we propose CRaSh, involving Clustering, Removing, and Sharing, a training-free strategy to derive improved emulators from LLMs. CRaSh significantly boosts performance of OFT with billions of parameters. Furthermore, we investigate the optimal solutions yielded by fine-tuning with and without full model through the lens of loss landscape. Our findings demonstrate a linear connectivity among these optima falling over the same basin, thereby highlighting the effectiveness of CRaSh and OFT. The source code is publicly available at https://github.com/TsinghuaC3I/CRaSh.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2310.15477

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota (0.28)
North America > United States > Washington > King County > Seattle (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Empowering Private Tutoring by Chaining Large Language Models

Chen, Yulin, Ding, Ning, Zheng, Hai-Tao, Liu, Zhiyuan, Sun, Maosong, Zhou, Bowen

arXiv.org Artificial IntelligenceSep-14-2023

Artificial intelligence has been applied in various aspects of online education to facilitate teaching and learning. However, few approaches has been made toward a complete AI-powered tutoring system. In this work, we explore the development of a full-fledged intelligent tutoring system powered by state-of-the-art large language models (LLMs), covering automatic course planning and adjusting, tailored instruction, and flexible quiz evaluation. To make the system robust to prolonged interaction and cater to individualized education, the system is decomposed into three inter-connected core processes-interaction, reflection, and reaction. Each process is implemented by chaining LLM-powered tools along with dynamically updated memory modules. Tools are LLMs prompted to execute one specific task at a time, while memories are data storage that gets updated during education process. Statistical results from learning logs demonstrate the effectiveness and mechanism of each tool usage. Subjective feedback from human users reveal the usability of each function, and comparison with ablation systems further testify the benefits of the designed processes in long-term interaction.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2309.08112

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Media > Music (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

Ding, Ning, Chen, Yulin, Xu, Bokai, Qin, Yujia, Zheng, Zhi, Hu, Shengding, Liu, Zhiyuan, Sun, Maosong, Zhou, Bowen

arXiv.org Artificial IntelligenceMay-23-2023

Fine-tuning on instruction data has been widely validated as an effective practice for implementing chat language models like ChatGPT. Scaling the diversity and quality of such data, although straightforward, stands a great chance of leading to improved performance. This paper aims to improve the upper bound of open-source models further. We first provide a systematically designed, diverse, informative, large-scale dataset of instructional conversations, UltraChat, which does not involve human queries. Our objective is to capture the breadth of interactions that a human might have with an AI assistant and employs a comprehensive framework to generate multi-turn conversation iteratively. UltraChat contains 1.5 million high-quality multi-turn dialogues and covers a wide range of topics and instructions. Our statistical analysis of UltraChat reveals its superiority in various key metrics, including scale, average length, diversity, coherence, etc., solidifying its position as a leading open-source dataset. Building upon UltraChat, we fine-tune a LLaMA model to create a powerful conversational model, UltraLLaMA. Our evaluations indicate that UltraLLaMA consistently outperforms other open-source models, including Vicuna, the previously recognized state-of-the-art open-source model. The dataset and the model will be publicly released\footnote{\url{https://github.com/thunlp/UltraChat}}.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.14233

Country: North America > United States (0.68)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Consumer Health (1.00)
Energy (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

PaD: Program-aided Distillation Specializes Large Models in Reasoning

Zhu, Xuekai, Qi, Biqing, Zhang, Kaiyan, Long, Xingwei, Zhou, Bowen

arXiv.org Artificial IntelligenceMay-23-2023

While Large Language Models (LLMs) excel in several natural language processing tasks, their size and inaccessibility present challenges for extensive practical application. Previous studies acquire specialized skills through distillation on LLMs, which result in trading generic abilities, called model specialization. As for reasoning ability, chain-of-thought was synthesized to subsequent distillation. However, due to hallucination, synthetic chain-of-thought from LLMs contains faulty reasoning. These incorrect reasoning steps damage the reasoning capability. To tackle above issues, we propose Program-aided Distillation (PaD), which distills LLMs to obtain specialized small models in reasoning tasks. In PaD, we strengthen specialized models with program-aided reasoning, and help them overcome faulty reasoning steps with automated error checking. Experimental results demonstrate that, on the GSM8K benchmark, a 0.06B model using PaD can not only outperform certain LLMs (e.g., LLaMA), but also achieves a 10% improvement over baselines with a significantly smaller scale of parameters and data. Data pruning analysis reveals that PaD possesses higher training efficiency.

artificial intelligence, natural language, small model, (15 more...)

arXiv.org Artificial Intelligence

2305.13888

Country:

Asia > China (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Trustworthy AI: From Principles to Practices

Li, Bo, Qi, Peng, Liu, Bo, Di, Shuai, Liu, Jingen, Pei, Jiquan, Yi, Jinfeng, Zhou, Bowen

arXiv.org Artificial IntelligenceOct-3-2021

Fast developing artificial intelligence (AI) technology has enabled various applied systems deployed in the real world, impacting people's everyday lives. However, many current AI systems were found vulnerable to imperceptible attacks, biased against underrepresented groups, lacking in user privacy protection, etc., which not only degrades user experience but erodes the society's trust in all AI systems. In this review, we strive to provide AI practitioners a comprehensive guide towards building trustworthy AI systems. We first introduce the theoretical framework of important aspects of AI trustworthiness, including robustness, generalization, explainability, transparency, reproducibility, fairness, privacy preservation, alignment with human values, and accountability. We then survey leading approaches in these aspects in the industry. To unify the current fragmented approaches towards trustworthy AI, we propose a systematic approach that considers the entire lifecycle of AI systems, ranging from data acquisition to model development, to development and deployment, finally to continuous monitoring and governance. In this framework, we offer concrete action items to practitioners and societal stakeholders (e.g., researchers and regulators) to improve AI trustworthiness. Finally, we identify key opportunities and challenges in the future development of trustworthy AI systems, where we identify the need for paradigm shift towards comprehensive trustworthy AI systems.

data mining, machine learning, reinforcement learning, (31 more...)

arXiv.org Artificial Intelligence

2110.01167

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)
Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Transportation (1.00)
Social Sector (1.00)
Law (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)

Add feedback

Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation

Liu, Guangyi, Yang, Zichao, Tao, Tianhua, Liang, Xiaodan, Li, Zhen, Zhou, Bowen, Cui, Shuguang, Hu, Zhiting

arXiv.org Artificial IntelligenceAug-9-2021

Neural text generation models are typically trained by maximizing log-likelihood with the sequence cross entropy loss, which encourages an exact token-by-token match between a target sequence with a generated sequence. Such training objective is sub-optimal when the target sequence not perfect, e.g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available. To address this challenge, we propose a novel Edit-Invariant Sequence Loss (EISL), which computes the matching loss of a target n-gram with all n-grams in the generated sequence. EISL draws inspirations from convolutional networks (ConvNets) which are shift-invariant to images, hence is robust to the shift of n-grams to tolerate edits in the target sequences. Moreover, the computation of EISL is essentially a convolution operation with target n-grams as kernels, which is easy to implement with existing libraries. To demonstrate the effectiveness of EISL, we conduct experiments on three tasks: machine translation with noisy target sequences, unsupervised text style transfer, and non-autoregressive machine translation. Experimental results show our method significantly outperforms cross entropy loss on these three tasks.

computational linguistics, deep learning, neural network, (22 more...)

arXiv.org Artificial Intelligence

2106.15078

Country:

Europe (1.00)
Asia (1.00)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
(2 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SGG: Learning to Select, Guide, and Generate for Keyphrase Generation

Zhao, Jing, Bao, Junwei, Wang, Yifan, Wu, Youzheng, He, Xiaodong, Zhou, Bowen

arXiv.org Artificial IntelligenceMay-6-2021

Keyphrases, that concisely summarize the high-level topics discussed in a document, can be categorized into present keyphrase which explicitly appears in the source text, and absent keyphrase which does not match any contiguous subsequence but is highly semantically related to the source. Most existing keyphrase generation approaches synchronously generate present and absent keyphrases without explicitly distinguishing these two categories. In this paper, a Select-Guide-Generate (SGG) approach is proposed to deal with present and absent keyphrase generation separately with different mechanisms. Specifically, SGG is a hierarchical neural network which consists of a pointing-based selector at low layer concentrated on present keyphrase generation, a selection-guided generator at high layer dedicated to absent keyphrase generation, and a guider in the middle to transfer information from selector to generator. Experimental results on four keyphrase generation benchmarks demonstrate the effectiveness of our model, which significantly outperforms the strong baselines for both present and absent keyphrases generation. Furthermore, we extend SGG to a title generation task which indicates its extensibility in natural language generation tasks.

deep learning, keyphrase, neural network, (21 more...)

arXiv.org Artificial Intelligence

2105.02544

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Relation Module for Non-answerable Prediction on Question Answering

Huang, Kevin, Tang, Yun, Huang, Jing, He, Xiaodong, Zhou, Bowen

arXiv.org Artificial IntelligenceOct-23-2019

Machine reading comprehension(MRC) has attracted significant amounts of research attention recently, due to an increase of challenging reading comprehension datasets. In this paper, we aim to improve a MRC model's ability to determine whether a question has an answer in a given context (e.g. the recently proposed SQuAD 2.0 task). Our solution is a relation module that is adaptable to any MRC model. The relation module consists of both semantic extraction and relational information. We first extract high level semantics as objects from both question and context with multi-head self-attentive pooling. These semantic objects are then passed to a relation network, which generates relationship scores for each object pair in a sentence. These scores are used to determine whether a question is non-answerable. We test the relation module on the SQuAD 2.0 dataset using both BiDAF and BERT models as baseline readers. We obtain 1.8% gain of F1 on top of the BiDAF reader, and 1.0% on top of the BERT base model. These results show the effectiveness of our relation module on MRC

artificial intelligence, object-oriented architecture, relation module, (19 more...)

arXiv.org Artificial Intelligence

1910.10843

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.42)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.35)

Add feedback

Multiple instance learning with graph neural networks

Tu, Ming, Huang, Jing, He, Xiaodong, Zhou, Bowen

arXiv.org Machine LearningJun-11-2019

Multiple instance learning (MIL) aims to learn the mapping between a bag of instances and the bag-level label. In this paper, we propose a new end-to-end graph neural network (GNN) based algorithm for MIL: we treat each bag as a graph and use GNN to learn the bag embedding, in order to explore the useful structural information among instances in bags. The final graph representation is fed into a classifier for label prediction. Our algorithm is the first attempt to use GNN for MIL. We empirically show that the proposed algorithm achieves the state of the art performance on several popular MIL data sets without losing model interpretability.

algorithm, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

1906.04881

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Improving the Robustness of Deep Neural Networks via Adversarial Training with Triplet Loss

Li, Pengcheng, Yi, Jinfeng, Zhou, Bowen, Zhang, Lijun

arXiv.org Machine LearningMay-28-2019

Recent studies have highlighted that deep neural networks (DNNs) are vulnerable to adversarial examples. In this paper, we improve the robustness of DNNs by utilizing techniques of Distance Metric Learning. Specifically, we incorporate Triplet Loss, one of the most popular Distance Metric Learning methods, into the framework of adversarial training. Our proposed algorithm, Adversarial Training with Triplet Loss (AT$^2$L), substitutes the adversarial example against the current model for the anchor of triplet loss to effectively smooth the classification boundary. Furthermore, we propose an ensemble version of AT$^2$L, which aggregates different attack methods and model structures for better defense effects. Our empirical studies verify that the proposed approach can significantly improve the robustness of DNNs without sacrificing accuracy. Finally, we demonstrate that our specially designed triplet loss can also be used as a regularization term to enhance other defense methods.

adversarial example, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

1905.11713

Country: Asia (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback