Goto

Collaborating Authors

 Chen, Yan


Q-PETR: Quant-aware Position Embedding Transformation for Multi-View 3D Object Detection

arXiv.org Artificial Intelligence

Camera-based multi-view 3D detection has emerged as an attractive solution for autonomous driving due to its low cost and broad applicability. However, despite the strong performance of PETR-based methods in 3D perception benchmarks, their direct INT8 quantization for onboard deployment leads to drastic accuracy drops-up to 58.2% in mAP and 36.9% in NDS on the NuScenes dataset. In this work, we propose Q-PETR, a quantization-aware position embedding transformation that re-engineers key components of the PETR framework to reconcile the discrepancy between the dynamic ranges of positional encodings and image features, and to adapt the cross-attention mechanism for low-bit inference. By redesigning the positional encoding module and introducing an adaptive quantization strategy, Q-PETR maintains floating-point performance with a performance degradation of less than 1% under standard 8-bit per-tensor post-training quantization. Moreover, compared to its FP32 counterpart, Q-PETR achieves a two-fold speedup and reduces memory usage by three times, thereby offering a deployment-friendly solution for resource-constrained onboard devices. Extensive experiments across various PETR-series models validate the strong generalization and practical benefits of our approach.


Concurrent Learning with Aggregated States via Randomized Least Squares Value Iteration

arXiv.org Artificial Intelligence

Designing learning agents that explore efficiently in a complex environment has been widely recognized as a fundamental challenge in reinforcement learning. While a number of works have demonstrated the effectiveness of techniques based on randomized value functions on a single agent, it remains unclear, from a theoretical point of view, whether injecting randomization can help a society of agents {\it concurently} explore an environment. The theoretical results %that we established in this work tender an affirmative answer to this question. We adapt the concurrent learning framework to \textit{randomized least-squares value iteration} (RLSVI) with \textit{aggregated state representation}. We demonstrate polynomial worst-case regret bounds in both finite- and infinite-horizon environments. In both setups the per-agent regret decreases at an optimal rate of $\Theta\left(\frac{1}{\sqrt{N}}\right)$, highlighting the advantage of concurent learning. Our algorithm exhibits significantly lower space complexity compared to \cite{russo2019worst} and \cite{agrawal2021improved}. We reduce the space complexity by a factor of $K$ while incurring only a $\sqrt{K}$ increase in the worst-case regret bound, compared to \citep{agrawal2021improved,russo2019worst}. Additionally, we conduct numerical experiments to demonstrate our theoretical findings.


Sharper Error Bounds in Late Fusion Multi-view Clustering Using Eigenvalue Proportion

arXiv.org Artificial Intelligence

Multi-view clustering (MVC) aims to integrate complementary information from multiple views to enhance clustering performance. Late Fusion Multi-View Clustering (LFMVC) has shown promise by synthesizing diverse clustering results into a unified consensus. However, current LFMVC methods struggle with noisy and redundant partitions and often fail to capture high-order correlations across views. To address these limitations, we present a novel theoretical framework for analyzing the generalization error bounds of multiple kernel $k$-means, leveraging local Rademacher complexity and principal eigenvalue proportions. Our analysis establishes a convergence rate of $\mathcal{O}(1/n)$, significantly improving upon the existing rate in the order of $\mathcal{O}(\sqrt{k/n})$. Building on this insight, we propose a low-pass graph filtering strategy within a multiple linear $k$-means framework to mitigate noise and redundancy, further refining the principal eigenvalue proportion and enhancing clustering accuracy. Experimental results on benchmark datasets confirm that our approach outperforms state-of-the-art methods in clustering performance and robustness. The related codes is available at https://github.com/csliangdu/GMLKM .


GSRender: Deduplicated Occupancy Prediction via Weakly Supervised 3D Gaussian Splatting

arXiv.org Artificial Intelligence

3D occupancy perception is gaining increasing attention due to its capability to offer detailed and precise environment representations. Previous weakly-supervised NeRF methods balance efficiency and accuracy, with mIoU varying by 5-10 points due to sampling count along camera rays. Recently, real-time Gaussian splatting has gained widespread popularity in 3D reconstruction, and the occupancy prediction task can also be viewed as a reconstruction task. Consequently, we propose GSRender, which naturally employs 3D Gaussian Splatting for occupancy prediction, simplifying the sampling process. In addition, the limitations of 2D supervision result in duplicate predictions along the same camera ray. We implemented the Ray Compensation (RC) module, which mitigates this issue by compensating for features from adjacent frames. Finally, we redesigned the loss to eliminate the impact of dynamic objects from adjacent frames. Extensive experiments demonstrate that our approach achieves SOTA (state-of-the-art) results in RayIoU (+6.0), while narrowing the gap with 3D supervision methods. Our code will be released soon.


Clustering ensemble algorithm with high-order consistency learning

arXiv.org Artificial Intelligence

Most of the research on clustering ensemble focuses on designing practical consistency learning algorithms.To solve the problems that the quality of base clusters varies and the low-quality base clusters have an impact on the performance of the clustering ensemble, from the perspective of data mining, the intrinsic connections of data were mined based on the base clusters, and a high-order information fusion algorithm was proposed to represent the connections between data from different dimensions, namely Clustering Ensemble with High-order Consensus learning (HCLCE). Firstly, each high-order information was fused into a new structured consistency matrix. Then, the obtained multiple consistency matrices were fused together. Finally, multiple information was fused into a consistent result. Experimental results show that LCLCE algorithm has the clustering accuracy improved by an average of 7.22%, and the Normalized Mutual Information (NMI) improved by an average of 9.19% compared with the suboptimal Locally Weighted Evidence Accumulation (LWEA) algorithm. It can be seen that the proposed algorithm can obtain better clustering results compared with clustering ensemble algorithms and using one information alone.


Unsupervised Feature Selection Algorithm Based on Graph Filtering and Self-representation

arXiv.org Artificial Intelligence

Aiming at the problem that existing methods could not fully capture the intrinsic structure of data without considering the higher-order neighborhood information of the data, we proposed an unsupervised feature selection algorithm based on graph filtering and self-representation. Firstly,a higher-order graph filter was applied to the data to obtain its smooth representation,and a regularizer was designed to combine the higher-order graph information for the self-representation matrix learning to capture the intrinsic structure of the data. Secondly,l2,1 norm was used to reconstruct the error term and feature selection matrix to enhance the robustness and row sparsity of the model to select the discriminant features. Finally, an iterative algorithm was applied to effectively solve the proposed objective function and simulation experiments were carried out to verify the effectiveness of the proposed algorithm.


Unsupervised Feature Selection Algorithm Based on Dual Manifold Re-ranking

arXiv.org Artificial Intelligence

High-dimensional data is commonly encountered in numerous data analysis tasks. Feature selection techniques aim to identify the most representative features from the original high-dimensional data. Due to the absence of class label information, it is significantly more challenging to select appropriate features in unsupervised learning scenarios compared to supervised ones. Traditional unsupervised feature selection methods typically score the features of samples based on certain criteria, treating samples indiscriminately. However, these approaches fail to fully capture the internal structure of the data. The importance of different samples should vary, and there is a dual relationship between the weight of samples and features that will influence each other. Therefore, an unsupervised feature selection algorithm based on dual manifold re-ranking (DMRR) is proposed in this paper. Different similarity matrices are constructed to depict the manifold structures among samples, between samples and features, and among features themselves. Then, manifold re-ranking is performed by combining the initial scores of samples and features. By comparing DMRR with three original unsupervised feature selection algorithms and two unsupervised feature selection post-processing algorithms, experimental results confirm that the importance information of different samples and the dual relationship between sample and feature are beneficial for achieving better feature selection.


Optimal Downsampling for Imbalanced Classification with Generalized Linear Models

arXiv.org Machine Learning

Downsampling or under-sampling is a technique that is utilized in the context of large and highly imbalanced classification models. We study optimal downsampling for imbalanced classification using generalized linear models (GLMs). We propose a pseudo maximum likelihood estimator and study its asymptotic normality in the context of increasingly imbalanced populations relative to an increasingly large sample size. We provide theoretical guarantees for the introduced estimator. Additionally, we compute the optimal downsampling rate using a criterion that balances statistical accuracy and computational efficiency. Our numerical experiments, conducted on both synthetic and empirical data, further validate our theoretical results, and demonstrate that the introduced estimator outperforms commonly available alternatives.


Remaining Useful Life Prediction: A Study on Multidimensional Industrial Signal Processing and Efficient Transfer Learning Based on Large Language Models

arXiv.org Artificial Intelligence

Remaining useful life (RUL) prediction is crucial for maintaining modern industrial systems, where equipment reliability and operational safety are paramount. Traditional methods, based on small-scale deep learning or physical/statistical models, often struggle with complex, multidimensional sensor data and varying operating conditions, limiting their generalization capabilities. To address these challenges, this paper introduces an innovative regression framework utilizing large language models (LLMs) for RUL prediction. By leveraging the modelling power of LLMs pre-trained on corpus data, the proposed model can effectively capture complex temporal dependencies and improve prediction accuracy. Extensive experiments on the Turbofan engine's RUL prediction task show that the proposed model surpasses state-of-the-art (SOTA) methods on the challenging FD002 and FD004 subsets and achieves near-SOTA results on the other subsets. Notably, different from previous researches, our framework uses the same sliding window length and all sensor signals for all subsets, demonstrating strong consistency and generalization. Moreover, transfer learning experiments reveal that with minimal target domain data for fine-tuning, the model outperforms SOTA methods trained on full target domain data. This research highlights the significant potential of LLMs in industrial signal processing and RUL prediction, offering a forward-looking solution for health management in future intelligent industrial systems.


Generative AI Application for Building Industry

arXiv.org Artificial Intelligence

This paper investigates the transformative potential of generative AI technologies, particularly large language models (LLMs), within the building industry. By leveraging these advanced AI tools, the study explores their application across key areas such as energy code compliance, building design optimization, and workforce training. The research highlights how LLMs can automate labor-intensive processes, significantly improving efficiency, accuracy, and safety in building practices. The paper also addresses the challenges associated with interpreting complex visual and textual data in architectural plans and regulatory codes, proposing innovative solutions to enhance AI-driven compliance checking and design processes. Additionally, the study considers the broader implications of AI integration, including the development of AI-powered tools for comprehensive code compliance across various regulatory domains and the potential for AI to revolutionize workforce training through realistic simulations. This paper provides a comprehensive analysis of the current capabilities of generative AI in the building industry while outlining future directions for research and development, aiming to pave the way for smarter, more sustainable, and responsive construction practices.