Huang, Cheng
Sun-Shine: A Large Language Model for Tibetan Culture
Huang, Cheng, Gao, Fan, Tashi, Nyima, Liu, Yutong, Wang, Xiangxiang, Tsering, Thupten, Ma-bao, Ban, Duojie, Renzeg, Luosang, Gadeng, Dongrub, Rinchen, Tashi, Dorje, Feng, Xiao, Yu, Yongbin
Tibetan, a minority language in China, features a highly intricate grammatical structure, characterized by four verb tenses and a tense system with frequent irregularities, contributing to its extensive inflectional diversity. Recently, advances in Large Language Models (LLMs) have transformed the paradigm in many domains. Despite the success in other fields, current LLMs often fall short in catering to the needs of domain experts like Tibetans, and the potential of LLMs for Tibetan culture is under-explored. The intrinsic reasons are the immense and intricate nature of Tibetan culture as well as the necessity for higher granularity and richness in knowledge. Simultaneously, the complexity and uniqueness of its grammatical structure, coupled with its status as a minority ethnic language, contribute to data scarcity, which remains a fundamental challenge. To alleviate these issues, we introduce Llama-Sunshine (Sun-Shine), the first large language model for Tibetan culture, which is expert in various Tibetan language processing tasks. Sun-Shine incorporates state-of-the-art model architectures optimized for Tibetan's linguistic features. We also propose TIB-STC, a comprehensive dataset comprising diverse Tibetan texts such as literature, religious scripts, news, and conversational data, which is also the first large-scale dataset for Tibetan culture. Though comprehensive experiments, Sun-Shine not only demonstrates a higher level of knowledge expertise for Tibetan culture but also gains preliminary embodied intelligence capabilities in Tibetan language processing tasks, like language modeling, text classification, machine translation, and syntactic analysis. Moreover, it excels in low-resource scenarios, showcasing strong generalization capabilities.
TLUE: A Tibetan Language Understanding Evaluation Benchmark
Gao, Fan, Huang, Cheng, Tashi, Nyima, Wang, Xiangxiang, Tsering, Thupten, Ma-bao, Ban, Duojie, Renzeg, Luosang, Gadeng, Dongrub, Rinchen, Tashi, Dorje, Feng, Xiao, Yu, Yongbin
Large language models (LLMs) have made tremendous progress in recent years, but low-resource languages, such as Tibetan, remain significantly underrepresented in their evaluation. Despite Tibetan being spoken by over seven million people, it has largely been neglected in the development and assessment of LLMs. To address this gap, we present TLUE (A Tibetan Language Understanding Evaluation Benchmark), the first large-scale benchmark for assessing LLMs' capabilities in Tibetan. TLUE comprises two major components: (1) a comprehensive multi-task understanding benchmark spanning 5 domains and 67 subdomains, and (2) a safety benchmark covering 7 subdomains. We evaluate a diverse set of state-of-the-art LLMs. Experimental results demonstrate that most LLMs perform below the random baseline, highlighting the considerable challenges LLMs face in processing Tibetan, a low-resource language. TLUE provides an essential foundation for driving future research and progress in Tibetan language understanding and underscores the need for greater inclusivity in LLM development.
An Advantage-based Optimization Method for Reinforcement Learning in Large Action Space
Lin, Hai, Huang, Cheng, Chen, Zhihong
Reinforcement learning tasks in real-world scenarios often involve large, high-dimensional action spaces, leading to challenges such as convergence difficulties, instability, and high computational complexity. It is widely acknowledged that traditional value-based reinforcement learning algorithms struggle to address these issues effectively. A prevalent approach involves generating independent sub-actions within each dimension of the action space. However, this method introduces bias, hindering the learning of optimal policies. In this paper, we propose an advantage-based optimization method and an algorithm named Advantage Branching Dueling Q-network (ABQ). ABQ incorporates a baseline mechanism to tune the action value of each dimension, leveraging the advantage relationship across different sub-actions. With this approach, the learned policy can be optimized for each dimension. Empirical results demonstrate that ABQ outperforms BDQ, achieving 3%, 171%, and 84% more cumulative rewards in HalfCheetah, Ant, and Humanoid environments, respectively. Furthermore, ABQ exhibits competitive performance when compared against two continuous action benchmark algorithms, DDPG and TD3.
TCP-Diffusion: A Multi-modal Diffusion Model for Global Tropical Cyclone Precipitation Forecasting with Change Awareness
Huang, Cheng, Mu, Pan, Bai, Cong, Watson, Peter AG
Precipitation from tropical cyclones (TCs) can cause disasters such as flooding, mudslides, and landslides. Predicting such precipitation in advance is crucial, giving people time to prepare and defend against these precipitation-induced disasters. Developing deep learning (DL) rainfall prediction methods offers a new way to predict potential disasters. However, one problem is that most existing methods suffer from cumulative errors and lack physical consistency. Second, these methods overlook the importance of meteorological factors in TC rainfall and their integration with the numerical weather prediction (NWP) model. Therefore, we propose Tropical Cyclone Precipitation Diffusion (TCP-Diffusion), a multi-modal model for global tropical cyclone precipitation forecasting. It forecasts TC rainfall around the TC center for the next 12 hours at 3 hourly resolution based on past rainfall observations and multi-modal environmental variables. Adjacent residual prediction (ARP) changes the training target from the absolute rainfall value to the rainfall trend and gives our model the ability of rainfall change awareness, reducing cumulative errors and ensuring physical consistency. Considering the influence of TC-related meteorological factors and the useful information from NWP model forecasts, we propose a multi-model framework with specialized encoders to extract richer information from environmental variables and results provided by NWP models. The results of extensive experiments show that our method outperforms other DL methods and the NWP method from the European Centre for Medium-Range Weather Forecasts (ECMWF).
Location-guided Head Pose Estimation for Fisheye Image
Li, Bing, Zhang, Dong, Huang, Cheng, Xian, Yun, Li, Ming, Lee, Dah-Jye
Abstract--Camera with a fisheye or ultra-wide lens covers a wide field of view that cannot be modeled by the perspective projection. Serious fisheye lens distortion in the peripheral region of the image leads to degraded performance of the existing head pose estimation models trained on undistorted images. This paper presents a new approach for head pose estimation that uses the knowledge of head location in the image to reduce the negative effect of fisheye distortion. We develop an end-to-end convolutional neural network to estimate the head pose with the multi-task learning of head pose and head location. This distortion nature of the fisheye camera makes estimating head pose from fisheye image a very challenging task. HE goal of head pose estimation (HPE) in the context of computer vision is to estimate the orientation of the Head pose estimation from rectilinear image has already head concerning the camera coordinate system. The estimated achieved promising results. An intuitive idea for estimating head pose is usually expressed by Euler angles (pitch, yaw, the head pose from fisheye image is the so-called two-stage roll) [1].
Code Clone Detection based on Event Embedding and Event Dependency
Huang, Cheng, Zhou, Hui, Ye, Chunyang, Li, Bingzhuo
The code clone detection method based on semantic similarity has important value in software engineering tasks (e.g., software evolution, software reuse). Traditional code clone detection technologies pay more attention to the similarity of code at the syntax level, and less attention to the semantic similarity of the code. As a result, candidate codes similar in semantics are ignored. To address this issue, we propose a code clone detection method based on semantic similarity. By treating code as a series of interdependent events that occur continuously, we design a model namely EDAM to encode code semantic information based on event embedding and event dependency. The EDAM model uses the event embedding method to model the execution characteristics of program statements and the data dependence information between all statements. In this way, we can embed the program semantic information into a vector and use the vector to detect codes similar in semantics. Experimental results show that the performance of our EDAM model is superior to state of-the-art open source models for code clone detection.
Learning physics-based reduced-order models for a single-injector combustion process
Swischuk, Renee, Kramer, Boris, Huang, Cheng, Willcox, Karen
This paper presents a physics-based data-driven method to learn predictive reduced-order models (ROMs) from high-fidelity simulations, and illustrates it in the challenging context of a single-injector combustion process. The method combines the perspectives of model reduction and machine learning. Model reduction brings in the physics of the problem, constraining the ROM predictions to lie on a subspace defined by the governing equations. This is achieved by defining the ROM in proper orthogonal decomposition (POD) coordinates, which embed the rich physics information contained in solution snapshots of a high-fidelity computational fluid dynamics (CFD) model. The machine learning perspective brings the flexibility to use transformed physical variables to define the POD basis. This is in contrast to traditional model reduction approaches that are constrained to use the physical variables of the high-fidelity code. Combining the two perspectives, the approach identifies a set of transformed physical variables that expose quadratic structure in the combustion governing equations and learns a quadratic ROM from transformed snapshot data. This learning does not require access to the high-fidelity model implementation. Numerical experiments show that the ROM accurately predicts temperature, pressure, velocity, species concentrations, and the limit-cycle amplitude, with speedups of more than five orders of magnitude over high-fidelity models. Moreover, ROM-predicted pressure traces accurately match the phase of the pressure signal and yield good approximations of the limit-cycle amplitude.
A Distributed One-Step Estimator
Huang, Cheng, Huo, Xiaoming
Distributed statistical inference has recently attracted enormous attention. Many existing work focuses on the averaging estimator. We propose a one-step approach to enhance a simple-averaging based distributed estimator. We derive the corresponding asymptotic properties of the newly proposed estimator. We find that the proposed one-step estimator enjoys the same asymptotic properties as the centralized estimator. The proposed one-step approach merely requires one additional round of communication in relative to the averaging estimator; so the extra communication burden is insignificant. In finite sample cases, numerical examples show that the proposed estimator outperforms the simple averaging estimator with a large margin in terms of the mean squared errors. A potential application of the one-step approach is that one can use multiple machines to speed up large scale statistical inference with little compromise in the quality of estimators. The proposed method becomes more valuable when data can only be available at distributed machines with limited communication bandwidth.