AITopics | Liu, Weijie

Collaborating Authors

Liu, Weijie

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models

Lai, Zhiquan, Li, Shengwei, Tang, Xudong, Ge, Keshi, Liu, Weijie, Duan, Yabo, Qiao, Linbo, Li, Dongsheng

arXiv.org Artificial IntelligenceMar-21-2023

Foundation models are becoming the dominant deep learning technologies. Pretraining a foundation model is always time-consumed due to the large scale of both the model parameter and training dataset. Besides being computing-intensive, the training process is extremely memory-intensive and communication-intensive. These features make it necessary to apply 3D parallelism, which integrates data parallelism, pipeline model parallelism and tensor model parallelism, to achieve high training efficiency. To achieve this goal, some custom software frameworks such as Megatron-LM and DeepSpeed are developed. However, current 3D parallelism frameworks still meet two issues: i) they are not transparent to model developers, which need to manually modify the model to parallelize training. ii) their utilization of computation, GPU memory and network bandwidth are not sufficient. We propose Merak, an automated 3D parallelism deep learning training framework with high resource utilization. Merak automatically deploys with an automatic model partitioner, which uses a graph sharding algorithm on a proxy representation of the model. Merak also presents the non-intrusive API for scaling out foundation model training with minimal code modification. In addition, we design a high-performance 3D parallel runtime engine in Merak. It uses several techniques to exploit available training resources, including shifted critical path pipeline schedule that brings a higher computation utilization, stage-aware recomputation that makes use of idle worker memory, and sub-pipelined tensor model parallelism that overlaps communication and computation. Experiments on 64 GPUs show Merak can speedup the training performance over the state-of-the-art 3D parallelism frameworks of models with 1.5, 2.5, 8.3, and 20 billion parameters by up to 1.42X, 1.39X, 1.43X, and 1.61X, respectively.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TPDS.2023.3247001

2206.04959

Country:

Asia > China (0.93)
North America (0.93)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Modeling Fine-grained Information via Knowledge-aware Hierarchical Graph for Zero-shot Entity Retrieval

Wu, Taiqiang, Bai, Xingyu, Guo, Weigang, Liu, Weijie, Li, Siheng, Yang, Yujiu

arXiv.org Artificial IntelligenceNov-20-2022

Zero-shot entity retrieval, aiming to link mentions to candidate entities under the zero-shot setting, is vital for many tasks in Natural Language Processing. Most existing methods represent mentions/entities via the sentence embeddings of corresponding context from the Pre-trained Language Model. However, we argue that such coarse-grained sentence embeddings can not fully model the mentions/entities, especially when the attention scores towards mentions/entities are relatively low. In this work, we propose GER, a \textbf{G}raph enhanced \textbf{E}ntity \textbf{R}etrieval framework, to capture more fine-grained information as complementary to sentence embeddings. We extract the knowledge units from the corresponding context and then construct a mention/entity centralized graph. Hence, we can learn the fine-grained information about mention/entity by aggregating information from these knowledge units. To avoid the graph information bottleneck for the central mention/entity node, we construct a hierarchical graph and design a novel Hierarchical Graph Attention Network~(HGAN). Experimental results on popular benchmarks demonstrate that our proposed GER framework performs better than previous state-of-the-art models. The code has been available at https://github.com/wutaiqiang/GER-WSDM2023.

machine learning, mention entity, natural language, (18 more...)

arXiv.org Artificial Intelligence

2211.10991

Country:

North America > United States (1.00)
Asia (0.73)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.84)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

A Bidirectional Tree Tagging Scheme for Joint Medical Relation Extraction

Luo, Xukun, Liu, Weijie, Ma, Meng, Wang, Ping

arXiv.org Artificial IntelligenceAug-17-2022

Joint medical relation extraction refers to extracting triples, composed of entities and relations, from the medical text with a single model. One of the solutions is to convert this task into a sequential tagging task. However, in the existing works, the methods of representing and tagging the triples in a linear way failed to the overlapping triples, and the methods of organizing the triples as a graph faced the challenge of large computational effort. In this paper, inspired by the tree-like relation structures in the medical text, we propose a novel scheme called Bidirectional Tree Tagging (BiTT) to form the medical relation triples into two two binary trees and convert the trees into a word-level tags sequence. Based on BiTT scheme, we develop a joint relation extraction model to predict the BiTT tags and further extract medical triples efficiently. Our model outperforms the best baselines by 2.0\% and 2.5\% in F1 score on two medical datasets. What's more, the models with our BiTT scheme also obtain promising results in three public datasets of other domains.

artificial intelligence, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2008.13339

Genre: Research Report (0.82)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

Whitening Sentence Representations for Better Semantics and Faster Retrieval

Su, Jianlin, Cao, Jiarun, Liu, Weijie, Ou, Yangyiwen

arXiv.org Artificial IntelligenceMar-28-2021

Pre-training models such as BERT have achieved great success in many natural language processing tasks. However, how to obtain better sentence representation through these pre-training models is still worthy to exploit. Previous work has shown that the anisotropy problem is an critical bottleneck for BERT-based sentence representation which hinders the model to fully utilize the underlying semantic features. Therefore, some attempts of boosting the isotropy of sentence distribution, such as flow-based model, have been applied to sentence representations and achieved some improvement. In this paper, we find that the whitening operation in traditional machine learning can similarly enhance the isotropy of sentence representations and achieve competitive results. Furthermore, the whitening technique is also capable of reducing the dimensionality of the sentence representation. Our experimental results show that it can not only achieve promising performance but also significantly reduce the storage cost and accelerate the model retrieval speed.

artificial intelligence, computational linguistics, text processing, (14 more...)

arXiv.org Artificial Intelligence

2103.15316

Country:

Europe (1.00)
Asia (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.70)

Industry: Energy > Oil & Gas (0.54)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback