AITopics | Yang, Li

Plotting

Yang, Li

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MetaGater: Fast Learning of Conditional Channel Gated Networks via Federated Meta-Learning

Lin, Sen, Yang, Li, He, Zhezhi, Fan, Deliang, Zhang, Junshan

arXiv.org Artificial IntelligenceNov-28-2020

While deep learning has achieved phenomenal successes in many AI applications, its enormous model size and intensive computation requirements pose a formidable challenge to the deployment in resource-limited nodes. There has recently been an increasing interest in computationally-efficient learning methods, e.g., quantization, pruning and channel gating. However, most existing techniques cannot adapt to different tasks quickly. In this work, we advocate a holistic approach to jointly train the backbone network and the channel gating which enables dynamical selection of a subset of filters for more efficient local computation given the data input. Particularly, we develop a federated meta-learning approach to jointly learn good meta-initializations for both backbone networks and gating modules, by making use of the model similarity across learning tasks on different nodes. In this way, the learnt meta-gating module effectively captures the important filters of a good meta-backbone network, based on which a task-specific conditional channel gated network can be quickly adapted, i.e., through one-step gradient descent, from the meta-initializations in a two-stage procedure using new samples of that task. The convergence of the proposed federated meta-learning algorithm is established under mild conditions. Experimental results corroborate the effectiveness of our method in comparison to related work.

deep learning, module, neural network, (15 more...)

arXiv.org Artificial Intelligence

2011.12511

Country: North America > United States (0.14)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

KSM: Fast Multiple Task Adaption via Kernel-wise Soft Mask Learning

Yang, Li, He, Zhezhi, Zhang, Junshan, Fan, Deliang

arXiv.org Artificial IntelligenceSep-11-2020

Deep Neural Networks (DNN) could forget the knowledge about earlier tasks when learning new tasks, and this is known as \textit{catastrophic forgetting}. While recent continual learning methods are capable of alleviating the catastrophic problem on toy-sized datasets, some issues still remain to be tackled when applying them in real-world problems. Recently, the fast mask-based learning method (e.g. piggyback \cite{mallya2018piggyback}) is proposed to address these issues by learning only a binary element-wise mask in a fast manner, while keeping the backbone model fixed. However, the binary mask has limited modeling capacity for new tasks. A more recent work \cite{hung2019compacting} proposes a compress-grow-based method (CPG) to achieve better accuracy for new tasks by partially training backbone model, but with order-higher training cost, which makes it infeasible to be deployed into popular state-of-the-art edge-/mobile-learning. The primary goal of this work is to simultaneously achieve fast and high-accuracy multi task adaption in continual learning setting. Thus motivated, we propose a new training method called \textit{kernel-wise Soft Mask} (KSM), which learns a kernel-wise hybrid binary and real-value soft mask for each task, while using the same backbone model. Such a soft mask can be viewed as a superposition of a binary mask and a properly scaled real-value tensor, which offers a richer representation capability without low-level kernel support to meet the objective of low hardware overhead. We validate KSM on multiple benchmark datasets against recent state-of-the-art methods (e.g. Piggyback, Packnet, CPG, etc.), which shows good improvement in both accuracy and training cost.

artificial intelligence, backbone model, neural network, (18 more...)

arXiv.org Artificial Intelligence

2009.05668

Country: North America > United States > Arizona (0.14)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice

Yang, Li, Shami, Abdallah

arXiv.org Machine LearningAug-7-2020

Machine learning algorithms have been used widely in various applications and areas. To fit a machine learning model into different problems, its hyper-parameters must be tuned. Selecting the best hyper-parameter configuration for machine learning models has a direct impact on the model's performance. It often requires deep knowledge of machine learning algorithms and appropriate hyper-parameter optimization techniques. Although several automatic optimization techniques exist, they have different strengths and drawbacks when applied to different types of problems. In this paper, optimizing the hyper-parameters of common machine learning models is studied. We introduce several state-of-the-art optimization techniques and discuss how to apply them to machine learning algorithms. Many available libraries and frameworks developed for hyper-parameter optimization problems are provided, and some open challenges of hyper-parameter optimization research are also discussed in this paper. Moreover, experiments are conducted on benchmark datasets to compare the performance of different optimization methods and provide practical examples of hyper-parameter optimization. This survey paper will help industrial users, data analysts, and researchers to better develop machine learning models by identifying the proper hyper-parameter configurations effectively.

algorithm, deep learning, neural network, (21 more...)

arXiv.org Machine Learning

doi: 10.1016/j.neucom.2020.07.061

2007.15745

Country:

North America > United States (0.67)
North America > Canada > Ontario > Middlesex County > London (0.14)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry:

Education (0.92)
Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Add feedback

Big Bird: Transformers for Longer Sequences

Zaheer, Manzil, Guruganesh, Guru, Dubey, Avinava, Ainslie, Joshua, Alberti, Chris, Ontanon, Santiago, Pham, Philip, Ravula, Anirudh, Wang, Qifan, Yang, Li, Ahmed, Amr

arXiv.org Machine LearningJul-28-2020

Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP. Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that reduces this quadratic dependency to linear. We show that BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our theoretical analysis reveals some of the benefits of having $O(1)$ global tokens (such as CLS), that attend to the entire sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to 8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context, BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also propose novel applications to genomics data.

arxiv preprint arxiv, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

2007.14062

Country: North America > United States (0.67)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Tacit knowledge mining algorithm based on linguistic truth-valued concept lattice

Yang, Li, Wang, Yuhui

arXiv.org Artificial IntelligenceJan-15-2012

This paper is the continuation of our research work about linguistic truth-valued concept lattice. In order to provide a mathematical tool for mining tacit knowledge, we establish a concrete model of 6-ary linguistic truth-valued concept lattice and introduce a mining algorithm through the structure consistency. Specifically, we utilize the attributes to depict knowledge, propose the 6-ary linguistic truth-valued attribute extended context and congener context to characterize tacit knowledge, and research the necessary and sufficient conditions of forming tacit knowledge. We respectively give the algorithms of generating the linguistic truth-valued congener context and constructing the linguistic truth-valued concept lattice.

concept lattice, fuzzy logic, neural network, (14 more...)

arXiv.org Artificial Intelligence

1201.3107

Country: Asia > China (0.69)

Technology:

Information Technology > Knowledge Management (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.47)

Add feedback