Yang, Fang
A Navigation System for ROV's inspection on Fish Net Cage
Ge, Zhikang, Yang, Fang, Lu, Wenwu, Wei, Peng, Ying, Yibin, Peng, Chen
In this paper, we modify an off-the-shelf ROV, the BlueROV2, into a ROS-based framework and develop a localization module, a path planning system, and a control framework. For real-time, local localization, we employ the open-source TagSLAM library. Additionally, we propose a control strategy based on a Nominal Feedback Controller (NFC) to achieve precise trajectory tracking. The proposed system has been implemented and validated through experiments in a controlled laboratory environment, demonstrating its effectiveness for real-world applications.
Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs
Zhao, Pinxue, Zhang, Hailin, Fu, Fangcheng, Nie, Xiaonan, Liu, Qibin, Yang, Fang, Peng, Yuanbo, Jiao, Dian, Li, Shuaipeng, Xue, Jinbao, Tao, Yangyu, Cui, Bin
Nowadays, Large Language Models (LLMs) have been trained using extended context lengths to foster more creative applications. However, long context training poses great challenges considering the constraint of GPU memory. It not only leads to substantial activation memory consumption during training, but also incurs considerable memory fragmentation. To facilitate long context training, existing frameworks have adopted strategies such as recomputation and various forms of parallelisms. Nevertheless, these techniques rely on redundant computation or extensive communication, resulting in low Model FLOPS Utilization (MFU). In this paper, we propose MEMO, a novel LLM training framework designed for fine-grained activation memory management. Given the quadratic scaling of computation and linear scaling of memory with sequence lengths when using FlashAttention, we offload memory-consuming activations to CPU memory after each layer's forward pass and fetch them during the backward pass. To maximize the swapping of activations without hindering computation, and to avoid exhausting limited CPU memory, we implement a token-wise activation recomputation and swapping mechanism. Furthermore, we tackle the memory fragmentation issue by employing a bi-level Mixed Integer Programming (MIP) approach, optimizing the reuse of memory across transformer layers. Empirical results demonstrate that MEMO achieves an average of 2.42x and 2.26x MFU compared to Megatron-LM and DeepSpeed, respectively. This improvement is attributed to MEMO's ability to minimize memory fragmentation, reduce recomputation and intensive communication, and circumvent the delays associated with the memory reorganization process due to fragmentation. By leveraging fine-grained activation memory management, MEMO facilitates efficient training of 7B LLM with 1 million sequence length on just 8 A800 GPUs, achieving an MFU of 52.30%.
Improving Sequential Recommendation Models with an Enhanced Loss Function
Li, Fangyu, Yu, Shenbao, Zeng, Feng, Yang, Fang
There has been a growing interest in benchmarking sequential recommendation models and reproducing/improving existing models. For example, Rendle et al. improved matrix factorization models by tuning their parameters and hyperparameters. Petrov and Macdonald developed a more efficient and effective implementation of BERT4Rec, which resolved inconsistencies in performance comparison between BERT4Rec and SASRec in previous works. In particular, BERT4Rec and SASRec share a similar network structure, with the main difference lying in their training objective/loss function. Therefore, we analyzed the advantages and disadvantages of commonly used loss functions in sequential recommendation and proposed an improved loss function that leverages their strengths. We conduct extensive experiments on two influential open-source libraries, and the results demonstrate that our improved loss function significantly enhances the performance of GRU4Rec, SASRec, SR-GNN, and S3Rec models, improving their benchmarks significantly. Furthermore, the improved SASRec benchmark outperforms BERT4Rec on the ML-1M and Beauty datasets and achieves similar results to BERT4Rec on the ML-20M and Steam datasets. We also reproduce the results of the BERT4Rec model on the Beauty dataset. Finally, we provide a comprehensive explanation of the effectiveness of our improved loss function through experiments. Our code is publicly available at https://github.com/Li-fAngyU/sequential_rec.
Kernel induced random survival forests
Yang, Fang, Wang, Jiheng, Fan, Guangzhe
Kernel Induced Random Survival Forests (KIRSF) is a statistical learning algorithm which aims to improve prediction accuracy for survival data. As in Random Survival Forests (RSF), Cumulative Hazard Function is predicted for each individual in the test set. Prediction error is estimated using Harrell's concordance index (C index) [Harrell et al. (1982)]. The C-index can be interpreted as a misclassification probability and does not depend on a single fixed time for evaluation. The C-index also specifically accounts for censoring. By utilizing kernel functions, KIRSF achieves better results than RSF in many situations. In this report, we show how to incorporate kernel functions into RSF. We test the performance of KIRSF and compare our method to RSF. We find that the KIRSF's performance is better than RSF in many occasions.