Goto

Collaborating Authors

 Yang, Jingyi


Shall Your Data Strategy Work? Perform a Swift Study

arXiv.org Artificial Intelligence

This work presents a swift method to assess the efficacy of particular types of instruction-tuning data, utilizing just a handful of probe examples and eliminating the need for model retraining. This method employs the idea of gradient-based data influence estimation, analyzing the gradient projections of probe examples from the chosen strategy onto evaluation examples to assess its advantages. Building upon this method, we conducted three swift studies to investigate the potential of Chain-of-thought (CoT) data, query clarification data, and response evaluation data in enhancing model generalization. Subsequently, we embarked on a validation study to corroborate the findings of these swift studies. In this validation study, we developed training datasets tailored to each studied strategy and compared model performance with and without the use of these datasets. The results of the validation study aligned with the findings of the swift studies, validating the efficacy of our proposed method.


STGCN-LSTM for Olympic Medal Prediction: Dynamic Power Modeling and Causal Policy Optimization

arXiv.org Artificial Intelligence

This paper proposes a novel hybrid model, STGCN-LSTM, to forecast Olympic medal distributions by integrating the spatio-temporal relationships among countries and the long-term dependencies of national performance. The Spatial-Temporal Graph Convolution Network (STGCN) captures geographic and interactive factors-such as coaching exchange and socio-economic links-while the Long Short-Term Memory (LSTM) module models historical trends in medal counts, economic data, and demographics. To address zero-inflated outputs (i.e., the disparity between countries that consistently yield wins and those never having won medals), a Zero-Inflated Compound Poisson (ZICP) framework is incorporated to separate random zeros from structural zeros, providing a clearer view of potential breakthrough performances. Validation includes historical backtracking, policy shock simulations, and causal inference checks, confirming the robustness of the proposed method. Results shed light on the influence of coaching mobility, event specialization, and strategic investment on medal forecasts, offering a data-driven foundation for optimizing sports policies and resource allocation in diverse Olympic contexts.


Context Parallelism for Scalable Million-Token Inference

arXiv.org Artificial Intelligence

We present context parallelism for long-context large language model inference, which achieves near-linear scaling for long-context prefill latency with up to 128 H100 GPUs across 16 nodes. Particularly, our method achieves 1M context prefill with Llama3 405B model in 77s (93% parallelization efficiency, 63% FLOPS utilization) and 128K context prefill in 3.8s. We develop two lossless exact ring attention variants: pass-KV and pass-Q to cover a wide range of use cases with the state-of-the-art performance: full prefill, persistent KV prefill and decode. Benchmarks on H100 GPU hosts inter-connected with RDMA and TCP both show similar scalability for long-context prefill, demonstrating that our method scales well using common commercial data center with medium-to-low inter-host bandwidth.


FedPAE: Peer-Adaptive Ensemble Learning for Asynchronous and Model-Heterogeneous Federated Learning

arXiv.org Artificial Intelligence

Federated learning (FL) enables multiple clients with distributed data sources to collaboratively train a shared model without compromising data privacy. However, existing FL paradigms face challenges due to heterogeneity in client data distributions and system capabilities. Personalized federated learning (pFL) has been proposed to mitigate these problems, but often requires a shared model architecture and a central entity for parameter aggregation, resulting in scalability and communication issues. More recently, model-heterogeneous FL has gained attention due to its ability to support diverse client models, but existing methods are limited by their dependence on a centralized framework, synchronized training, and publicly available datasets. To address these limitations, we introduce Federated Peer-Adaptive Ensemble Learning (FedPAE), a fully decentralized pFL algorithm that supports model heterogeneity and asynchronous learning. Our approach utilizes a peer-to-peer model sharing mechanism and ensemble selection to achieve a more refined balance between local and global information. Experimental results show that FedPAE outperforms existing state-of-the-art pFL algorithms, effectively managing diverse client capabilities and demonstrating robustness against statistical heterogeneity.


Design and Verification of a Novel Triphibian Robot

arXiv.org Artificial Intelligence

Multi-modal robots expand their operations from one working medium to another, land to air for example. The majorities of multi-modal robots mainly refer to platforms that operate in two different media. However, for all-terrain tasks, there are seldom research to date in the literature. Generally, locomotions in different working media, i.e. land, water and air, require different propelling actuators, and thus the triphibian system becomes bulky. To overcome this challenge, we proposed a triphibian robot and provide the robot with driving forces to perform all-terrain operations in an efficient way. A morphable mechanism is designed to enable the transition between different motion modes, and specifically a cylindrical body is implemented as the rolling mechanism in land mode. Detailed design principles of different mechanisms and the transition between various locomotion modes are analyzed. Finally, a triphibian robot prototype is fabricated and tested in various working media with both mono-modal and multi-modal functionalities. Experiments have verified our platform, and the results show promising adaptions in future exploration tasks in various working scenarios.


Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation

arXiv.org Artificial Intelligence

Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for a query optimizer in DBMS. In the last decade, an increasing number of advanced CardEst methods (especially ML-based) have been proposed with outstanding estimation accuracy and inference latency. However, there exists no study that systematically evaluates the quality of these methods and answer the fundamental problem: to what extent can these methods improve the performance of query optimizer in real-world settings, which is the ultimate goal of a CardEst method. In this paper, we comprehensively and systematically compare the effectiveness of CardEst methods in a real DBMS. We establish a new benchmark for CardEst, which contains a new complex real-world dataset STATS and a diverse query workload STATS-CEB. We integrate multiple most representative CardEst methods into an open-source database system PostgreSQL, and comprehensively evaluate their true effectiveness in improving query plan quality, and other important aspects affecting their applicability, ranging from inference latency, model size, and training time, to update efficiency and accuracy. We obtain a number of key findings for the CardEst methods, under different data and query settings. Furthermore, we find that the widely used estimation accuracy metric(Q-Error) cannot distinguish the importance of different sub-plan queries during query optimization and thus cannot truly reflect the query plan quality generated by CardEst methods. Therefore, we propose a new metric P-Error to evaluate the performance of CardEst methods, which overcomes the limitation of Q-Error and is able to reflect the overall end-to-end performance of CardEst methods. We have made all of the benchmark data and evaluation code publicly available at https://github.com/Nathaniel-Han/End-to-End-CardEst-Benchmark.


Partially Interpretable Estimators (PIE): Black-Box-Refined Interpretable Machine Learning

arXiv.org Artificial Intelligence

We propose Partially Interpretable Estimators (PIE) which attribute a prediction to individual features via an interpretable model, while a (possibly) small part of the PIE prediction is attributed to the interaction of features via a black-box model, with the goal to boost the predictive performance while maintaining interpretability. As such, the interpretable model captures the main contributions of features, and the black-box model attempts to complement the interpretable piece by capturing the "nuances" of feature interactions as a refinement. We design an iterative training algorithm to jointly train the two types of models. Experimental results show that PIE is highly competitive to black-box models while outperforming interpretable baselines. In addition, the understandability of PIE is comparable to simple linear models as validated via a human evaluation.