Li, Li
FedGCS: A Generative Framework for Efficient Client Selection in Federated Learning via Gradient-based Optimization
Ning, Zhiyuan, Tian, Chunlin, Xiao, Meng, Fan, Wei, Wang, Pengyang, Li, Li, Wang, Pengfei, Zhou, Yuanchun
Federated Learning faces significant challenges in statistical and system heterogeneity, along with high energy consumption, necessitating efficient client selection strategies. Traditional approaches, including heuristic and learning-based methods, fall short of addressing these complexities holistically. In response, we propose FedGCS, a novel generative client selection framework that innovatively recasts the client selection process as a generative task. Drawing inspiration from the methodologies used in large language models, FedGCS efficiently encodes abundant decision-making knowledge within a continuous representation space, enabling efficient gradient-based optimization to search for optimal client selection that will be finally output via generation. The framework comprises four steps: (1) automatic collection of diverse "selection-score" pair data using classical client selection methods; (2) training an encoder-evaluator-decoder framework on this data to construct a continuous representation space; (3) employing gradient-based optimization in this space for optimal client selection; (4) generating the final optimal client selection via using beam search for the well-trained decoder. FedGCS outperforms traditional methods by being more comprehensive, generalizable, and efficient, simultaneously optimizing for model performance, latency, and energy consumption. The effectiveness of FedGCS is proven through extensive experimental analyses.
MGS-SLAM: Monocular Sparse Tracking and Gaussian Mapping with Depth Smooth Regularization
Zhu, Pengcheng, Zhuang, Yaoming, Chen, Baoquan, Li, Li, Wu, Chengdong, Liu, Zhanlin
This letter introduces a novel framework for dense Visual Simultaneous Localization and Mapping (VSLAM) based on Gaussian Splatting. Recently Gaussian Splatting-based SLAM has yielded promising results, but rely on RGB-D input and is weak in tracking. To address these limitations, we uniquely integrates advanced sparse visual odometry with a dense Gaussian Splatting scene representation for the first time, thereby eliminating the dependency on depth maps typical of Gaussian Splatting-based SLAM systems and enhancing tracking robustness. Here, the sparse visual odometry tracks camera poses in RGB stream, while Gaussian Splatting handles map reconstruction. These components are interconnected through a Multi-View Stereo (MVS) depth estimation network. And we propose a depth smooth loss to reduce the negative effect of estimated depth maps. Furthermore, the consistency in scale between the sparse visual odometry and the dense Gaussian map is preserved by Sparse-Dense Adjustment Ring (SDAR). We have evaluated our system across various synthetic and real-world datasets. The accuracy of our pose estimation surpasses existing methods and achieves state-of-the-art performance. Additionally, it outperforms previous monocular methods in terms of novel view synthesis fidelity, matching the results of neural SLAM systems that utilize RGB-D input.
Leverage Multi-source Traffic Demand Data Fusion with Transformer Model for Urban Parking Prediction
Huang, Yin, Dong, Yongqi, Tang, Youhua, Li, Li
The escalation in urban private car ownership has worsened the urban parking predicament, necessitating effective parking availability prediction for urban planning and management. However, the existing prediction methods suffer from low prediction accuracy with the lack of spatial-temporal correlation features related to parking volume, and neglect of flow patterns and correlations between similar parking lots within certain areas. To address these challenges, this study proposes a parking availability prediction framework integrating spatial-temporal deep learning with multi-source data fusion, encompassing traffic demand data from multiple sources (e.g., metro, bus, taxi services), and parking lot data. The framework is based on the Transformer as the spatial-temporal deep learning model and leverages K-means clustering to establish parking cluster zones, extracting and integrating traffic demand characteristics from various transportation modes (i.e., metro, bus, online ride-hailing, and taxi) connected to parking lots. Real-world empirical data was used to verify the effectiveness of the proposed method compared with different machine learning, deep learning, and traditional statistical models for predicting parking availability. Experimental results reveal that, with the proposed pipeline, the developed Transformer model outperforms other models in terms of various metrics, e.g., Mean Squared Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). By fusing multi-source demanding data with spatial-temporal deep learning techniques, this approach offers the potential to develop parking availability prediction systems that furnish more accurate and timely information to both drivers and urban planners, thereby fostering more efficient and sustainable urban mobility.
AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation
Sun, Zhensu, Du, Xiaoning, Yang, Zhou, Li, Li, Lo, David
Besides humans and machines, Artificial Intelligence (AI) models have emerged to be another important audience of programming languages, as we come to the era of large language models (LLMs). LLMs can now excel at coding competitions and even program like developers to address various tasks, such as math calculation. Yet, the grammar and layout of existing programs are designed for humans. Particularly, abundant grammar tokens and formatting tokens are included to make the code more readable to humans. While beneficial, such a human-centric design imposes an unnecessary computational burden on LLMs where each token, either consumed or generated, consumes computational resources. To improve inference efficiency and reduce computational costs, we propose the concept of AI-oriented grammar, which aims to represent the code in a way that better suits the working mechanism of AI models. Code written with AI-oriented grammar discards formats and uses a minimum number of tokens to convey code semantics effectively. To demonstrate the feasibility of this concept, we explore and implement the first AI-oriented grammar for Python, named Simple Python (SimPy). SimPy is crafted by revising the original Python grammar through a series of heuristic rules. Programs written in SimPy maintain identical Abstract Syntax Tree (AST) structures to those in standard Python, allowing execution via a modified AST parser. In addition, we explore methods to enable existing LLMs to proficiently understand and use SimPy, and ensure the changes remain imperceptible for human developers. Compared with the original Python, SimPy not only reduces token usage by 13.5% and 10.4% for CodeLlama and GPT-4, but can also achieve equivalent, even improved, performance over the models trained on Python code.
Breaking the Memory Wall for Heterogeneous Federated Learning with Progressive Training
Wu, Yebo, Li, Li, Tian, Chunlin, Xu, Chengzhong
This paper presents ProFL, a novel progressive FL framework to effectively break the memory wall. Specifically, ProFL divides the model into different blocks based on its original architecture. Instead of updating the full model in each training round, ProFL first trains the front blocks and safely freezes them after convergence. Training of the next block is then triggered. This process iterates until the training of the whole model is completed. In this way, the memory footprint is effectively reduced for feasible deployment on heterogeneous devices. In order to preserve the feature representation of each block, we decouple the whole training process into two stages: progressive model shrinking and progressive model growing. During the progressive model shrinking stage, we meticulously design corresponding output modules to assist each block in learning the expected feature representation and obtain the initialization parameters. Then, the obtained output modules are utilized in the corresponding progressive model growing stage. Additionally, to control the training pace for each block, a novel metric from the scalar perspective is proposed to assess the learning status of each block and determines when to trigger the training of the next one. Finally, we theoretically prove the convergence of ProFL and conduct extensive experiments on representative models and datasets to evaluate the effectiveness of ProFL. The results demonstrate that ProFL effectively reduces the peak memory footprint by up to 57.4% and improves model accuracy by up to 82.4%.
Open-Source AI-based SE Tools: Opportunities and Challenges of Collaborative Software Learning
Lin, Zhihao, Ma, Wei, Lin, Tao, Zheng, Yaowen, Ge, Jingquan, Wang, Jun, Klein, Jacques, Bissyande, Tegawende, Liu, Yang, Li, Li
Large Language Models (LLMs) have become instrumental in advancing software engineering (SE) tasks, showcasing their efficacy in code understanding and beyond. Like traditional SE tools, open-source collaboration is key in realising the excellent products. However, with AI models, the essential need is in data. The collaboration of these AI-based SE models hinges on maximising the sources of high-quality data. However, data especially of high quality, often holds commercial or sensitive value, making it less accessible for open-source AI-based SE projects. This reality presents a significant barrier to the development and enhancement of AI-based SE tools within the software engineering community. Therefore, researchers need to find solutions for enabling open-source AI-based SE models to tap into resources by different organisations. Addressing this challenge, our position paper investigates one solution to facilitate access to diverse organizational resources for open-source AI models, ensuring privacy and commercial sensitivities are respected. We introduce a governance framework centered on federated learning (FL), designed to foster the joint development and maintenance of open-source AI code models while safeguarding data privacy and security. Additionally, we present guidelines for developers on AI-based SE tool collaboration, covering data requirements, model architecture, updating strategies, and version control. Given the significant influence of data characteristics on FL, our research examines the effect of code data heterogeneity on FL performance.
Life-long Learning and Testing for Automated Vehicles via Adaptive Scenario Sampling as A Continuous Optimization Process
Ge, Jingwei, Wang, Pengbo, Chang, Cheng, Zhang, Yi, Yao, Danya, Li, Li
Sampling critical testing scenarios is an essential step in intelligence testing for Automated Vehicles (AVs). However, due to the lack of prior knowledge on the distribution of critical scenarios in sampling space, we can hardly efficiently find the critical scenarios or accurately evaluate the intelligence of AVs. To solve this problem, we formulate the testing as a continuous optimization process which iteratively generates potential critical scenarios and meanwhile evaluates these scenarios. A bi-level loop is proposed for such life-long learning and testing. In the outer loop, we iteratively learn space knowledge by evaluating AV in the already sampled scenarios and then sample new scenarios based on the retained knowledge. Outer loop stops when all generated samples cover the whole space. While to maximize the coverage of the space in each outer loop, we set an inner loop which receives newly generated samples in outer loop and outputs the updated positions of these samples. We assume that points in a small sphere-like subspace can be covered (or represented) by the point in the center of this sphere. Therefore, we can apply a multi-rounds heuristic strategy to move and pack these spheres in space to find the best covering solution. The simulation results show that faster and more accurate evaluation of AVs can be achieved with more critical scenarios.
Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation
Lei, Xiaohan, Wang, Min, Zhou, Wengang, Li, Li, Li, Houqiang
As a new embodied vision task, Instance ImageGoal Navigation (IIN) aims to navigate to a specified object depicted by a goal image in an unexplored environment. The main challenge of this task lies in identifying the target object from different viewpoints while rejecting similar distractors. Existing ImageGoal Navigation methods usually adopt the simple Exploration-Exploitation framework and ignore the identification of specific instance during navigation. In this work, we propose to imitate the human behaviour of ``getting closer to confirm" when distinguishing objects from a distance. Specifically, we design a new modular navigation framework named Instance-aware Exploration-Verification-Exploitation (IEVE) for instance-level image goal navigation. Our method allows for active switching among the exploration, verification, and exploitation actions, thereby facilitating the agent in making reasonable decisions under different situations. On the challenging HabitatMatterport 3D semantic (HM3D-SEM) dataset, our method surpasses previous state-of-the-art work, with a classical segmentation model (0.684 vs. 0.561 success) or a robust model (0.702 vs. 0.561 success)
Towards Personalized Evaluation of Large Language Models with An Anonymous Crowd-Sourcing Platform
Cheng, Mingyue, Zhang, Hao, Yang, Jiqian, Liu, Qi, Li, Li, Huang, Xin, Song, Liwei, Li, Zhi, Huang, Zhenya, Chen, Enhong
Large language model evaluation plays a pivotal role in the enhancement of its capacity. Previously, numerous methods for evaluating large language models have been proposed in this area. Despite their effectiveness, these existing works mainly focus on assessing objective questions, overlooking the capability to evaluate subjective questions which is extremely common for large language models. Additionally, these methods predominantly utilize centralized datasets for evaluation, with question banks concentrated within the evaluation platforms themselves. Moreover, the evaluation processes employed by these platforms often overlook personalized factors, neglecting to consider the individual characteristics of both the evaluators and the models being evaluated. To address these limitations, we propose a novel anonymous crowd-sourcing evaluation platform, BingJian, for large language models that employs a competitive scoring mechanism where users participate in ranking models based on their performance. This platform stands out not only for its support of centralized evaluations to assess the general capabilities of models but also for offering an open evaluation gateway. Through this gateway, users have the opportunity to submit their questions, testing the models on a personalized and potentially broader range of capabilities. Furthermore, our platform introduces personalized evaluation scenarios, leveraging various forms of human-computer interaction to assess large language models in a manner that accounts for individual user preferences and contexts. The demonstration of BingJian can be accessed at https://github.com/Mingyue-Cheng/Bingjian.
DLP-GAN: learning to draw modern Chinese landscape photos with generative adversarial network
Gui, Xiangquan, Zhang, Binxuan, Li, Li, Yang, Yi
Chinese landscape painting has a unique and artistic style, and its drawing technique is highly abstract in both the use of color and the realistic representation of objects. Previous methods focus on transferring from modern photos to ancient ink paintings. However, little attention has been paid to translating landscape paintings into modern photos. To solve such problems, in this paper, we (1) propose DLP-GAN (Draw Modern Chinese Landscape Photos with Generative Adversarial Network), an unsupervised cross-domain image translation framework with a novel asymmetric cycle mapping, and (2) introduce a generator based on a dense-fusion module to match different translation directions. Moreover, a dual-consistency loss is proposed to balance the realism and abstraction of model painting. In this way, our model can draw landscape photos and sketches in the modern sense. Finally, based on our collection of modern landscape and sketch datasets, we compare the images generated by our model with other benchmarks. Extensive experiments including user studies show that our model outperforms state-of-the-art methods.