Goto

Collaborating Authors

 proceeding



Robust Causal Discovery under Imperfect Structural Constraints

Wang, Zidong, Lin, Xi, He, Chuchao, Gao, Xiaoguang

arXiv.org Machine Learning

Robust causal discovery from observational data under imperfect prior knowledge remains a significant and largely unresolved challenge. Existing methods typically presuppose perfect priors or can only handle specific, pre-identified error types. And their performance degrades substantially when confronted with flawed constraints of unknown location and type. This decline arises because most of them rely on inflexible and biased thresholding strategies that may conflict with the data distribution. To overcome these limitations, we propose to harmonizes knowledge and data through prior alignment and conflict resolution. First, we assess the credibility of imperfect structural constraints through a surrogate model, which then guides a sparse penalization term measuring the loss between the learned and constrained adjacency matrices. We theoretically prove that, under ideal assumption, the knowledge-driven objective aligns with the data-driven objective. Furthermore, to resolve conflicts when this assumption is violated, we introduce a multi-task learning framework optimized via multi-gradient descent, jointly minimizing both objectives. Our proposed method is robust to both linear and nonlinear settings. Extensive experiments, conducted under diverse noise conditions and structural equation model types, demonstrate the effectiveness and efficiency of our method under imperfect structural constraints.


Transferable Direct Prompt Injection via Activation-Guided MCMC Sampling

Li, Minghui, Zhang, Hao, Zhang, Yechao, Wan, Wei, Hu, Shengshan, Xiaobing, pei, Wang, Jing

arXiv.org Artificial Intelligence

Direct Prompt Injection (DPI) attacks pose a critical security threat to Large Language Models (LLMs) due to their low barrier of execution and high potential damage. To address the impracticality of existing white-box/gray-box methods and the poor transferability of black-box methods, we propose an activations-guided prompt injection attack framework. We first construct an Energy-based Model (EBM) using activations from a surrogate model to evaluate the quality of adversarial prompts. Guided by the trained EBM, we employ the token-level Markov Chain Monte Carlo (MCMC) sampling to adaptively optimize adversarial prompts, thereby enabling gradient-free black-box attacks. Experimental results demonstrate our superior cross-model transferability, achieving 49.6% attack success rate (ASR) across five mainstream LLMs and 34.6% improvement over human-crafted prompts, and maintaining 36.6% ASR on unseen task scenarios. Interpretability analysis reveals a correlation between activations and attack effectiveness, highlighting the critical role of semantic patterns in transferable vulnerability exploitation.


LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models

Tang, Beilong, Zeng, Bang, Li, Ming

arXiv.org Artificial Intelligence

--We propose LauraTSE, an Auto-Regressive Decoder-Only Language Model for T arget Speaker Extraction built upon the LauraGPT backbone. LauraTSE employs a small-scale auto-regressive decoder-only language model that generates the initial layers of the target speech's discrete codec representations from the continuous embeddings of both the mixture and reference speech. These outputs serve as coarse-grained predictions. T o refine them, a one-step encoder-only language model reconstructs the full codec representation by integrating information from both the mixture and the reference speech, adding fine-grained details. Experimental results show that our approach can achieve promising performance. Additionally, we conduct ablation studies to investigate the data scalability and the contribution of the encoder-only model. Target Speaker Extraction (TSE) aims at extracting target speaker's speech from a mixture using auxiliary information like reference speech, spatial information, or visual information etc., regarding the target speaker [1]. Current dominant approaches utilize discriminative models which try to directly map the mixture speech to target clean speech [2]- [5]. However, this method might struggle for unseen data and sometimes even introduce undesirable distortions [6].



Equilibrium-Driven Smooth Separation and Navigation of Marsupial Robotic Systems

Hu, Bin-Bin, Jayawardhana, Bayu, Cao, Ming

arXiv.org Artificial Intelligence

--In this paper, we propose an equilibrium-driven controller that enables a marsupial carrier-passenger robotic system to achieve smooth carrier-passenger separation and then to navigate the passenger robot toward a predetermined target point. This introduces multiple equilibrium points corresponding to the zero state of the error dynamic system during carrier-passenger separation. The change of equilibrium points is associated with the change in their attraction regions, enabling smooth carrier-passenger separation and afterwards seamless navigation toward the target. Finally, simulations demonstrate the effectiveness and adaptability of the proposed controller in environments containing obstacles. Over the years, the coordination of multi-robot systems (MRSs) has played an important role in a wide range of applications, including urban search and rescue, environmental monitoring, delivery, and collaborative convoying [1]-[3].


RBFleX-NAS: Training-Free Neural Architecture Search Using Radial Basis Function Kernel and Hyperparameter Detection

Yamasaki, Tomomasa, Wang, Zhehui, Luo, Tao, Chen, Niangjun, Wang, Bo

arXiv.org Artificial Intelligence

Neural Architecture Search (NAS) is an automated technique to design optimal neural network architectures for a specific workload. Conventionally, evaluating candidate networks in NAS involves extensive training, which requires significant time and computational resources. To address this, training-free NAS has been proposed to expedite network evaluation with minimal search time. However, state-of-the-art training-free NAS algorithms struggle to precisely distinguish well-performing networks from poorly-performing networks, resulting in inaccurate performance predictions and consequently sub-optimal top-1 network accuracy. Moreover, they are less effective in activation function exploration. To tackle the challenges, this paper proposes RBFleX-NAS, a novel training-free NAS framework that accounts for both activation outputs and input features of the last layer with a Radial Basis Function (RBF) kernel. We also present a detection algorithm to identify optimal hyperparameters using the obtained activation outputs and input feature maps. We verify the efficacy of RBFleX-NAS over a variety of NAS benchmarks. RBFleX-NAS significantly outperforms state-of-the-art training-free NAS methods in terms of top-1 accuracy, achieving this with short search time in NAS-Bench-201 and NAS-Bench-SSS. In addition, it demonstrates higher Kendall correlation compared to layer-based training-free NAS algorithms. Furthermore, we propose NAFBee, a new activation design space that extends the activation type to encompass various commonly used functions. In this extended design space, RBFleX-NAS demonstrates its superiority by accurately identifying the best-performing network during activation function search, providing a significant advantage over other NAS algorithms.


Evaluating Scenario-based Decision-making for Interactive Autonomous Driving Using Rational Criteria: A Survey

Tian, Zhen, Lin, Zhihao, Zhao, Dezong, Zhao, Wenjing, Flynn, David, Ansari, Shuja, Wei, Chongfeng

arXiv.org Artificial Intelligence

Autonomous vehicles (AVs) can significantly promote the advances in road transport mobility in terms of safety, reliability, and decarbonization. However, ensuring safety and efficiency in interactive during within dynamic and diverse environments is still a primary barrier to large-scale AV adoption. In recent years, deep reinforcement learning (DRL) has emerged as an advanced AI-based approach, enabling AVs to learn decision-making strategies adaptively from data and interactions. DRL strategies are better suited than traditional rule-based methods for handling complex, dynamic, and unpredictable driving environments due to their adaptivity. However, varying driving scenarios present distinct challenges, such as avoiding obstacles on highways and reaching specific exits at intersections, requiring different scenario-specific decision-making algorithms. Many DRL algorithms have been proposed in interactive decision-making. However, a rationale review of these DRL algorithms across various scenarios is lacking. Therefore, a comprehensive evaluation is essential to assess these algorithms from multiple perspectives, including those of vehicle users and vehicle manufacturers. This survey reviews the application of DRL algorithms in autonomous driving across typical scenarios, summarizing road features and recent advancements. The scenarios include highways, on-ramp merging, roundabouts, and unsignalized intersections. Furthermore, DRL-based algorithms are evaluated based on five rationale criteria: driving safety, driving efficiency, training efficiency, unselfishness, and interpretability (DDTUI). Each criterion of DDTUI is specifically analyzed in relation to the reviewed algorithms. Finally, the challenges for future DRL-based decision-making algorithms are summarized.


DDIM sampling for Generative AIBIM, a faster intelligent structural design framework

He, Zhili, Wang, Yu-Hsing

arXiv.org Artificial Intelligence

Generative AIBIM, a successful structural design pipeline, has proven its ability to intelligently generate high-quality, diverse, and creative shear wall designs that are tailored to specific physical conditions. However, the current module of Generative AIBIM that generates designs, known as the physics-based conditional diffusion model (PCDM), necessitates 1000 iterations for each generation due to its reliance on the denoising diffusion probabilistic model (DDPM) sampling process. This leads to a time-consuming and computationally demanding generation process. To address this issue, this study introduces the denoising diffusion implicit model (DDIM), an accelerated generation method that replaces the DDPM sampling process in PCDM. While the original DDIM was designed for DDPM and the optimization process of PCDM differs from that of DDPM, this paper designs "DDIM sampling for PCDM," which modifies the original DDIM formulations to adapt to the optimization process of PCDM. Experimental results demonstrate that DDIM sampling for PCDM can accelerate the generation process of the original PCDM by a factor of 100 while maintaining the same visual quality in the generated results. This study effectively showcases the effectiveness of DDIM sampling for PCDM in expediting intelligent structural design. Furthermore, this paper reorganizes the contents of DDIM, focusing on the practical usage of DDIM. This change is particularly meaningful for researchers who may not possess a strong background in machine learning theory but are interested in utilizing the tool effectively.


ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering

Chen, Xiuying, Wang, Tairan, Guo, Taicheng, Guo, Kehan, Zhou, Juexiao, Li, Haoyang, Zhuge, Mingchen, Schmidhuber, Jürgen, Gao, Xin, Zhang, Xiangliang

arXiv.org Artificial Intelligence

Question Answering (QA) effectively evaluates language models' reasoning and knowledge depth. While QA datasets are plentiful in areas like general domain and biomedicine, academic chemistry is less explored. Chemical QA plays a crucial role in both education and research by effectively translating complex chemical information into readily understandable format. Addressing this gap, we introduce ScholarChemQA, a large-scale QA dataset constructed from chemical papers. This dataset reflects typical real-world challenges, including an imbalanced data distribution and a substantial amount of unlabeled data that can be potentially useful. Correspondingly, we introduce a QAMatch model, specifically designed to effectively answer chemical questions by fully leveraging our collected data. We first address the issue of imbalanced label distribution by re-weighting the instance-wise loss based on the inverse frequency of each class, ensuring minority classes are not dominated by majority ones during optimization. Next, we utilize the unlabeled data to enrich the learning process, generating a variety of augmentations based on a SoftMix operation and ensuring their predictions align with the same target, i.e., pseudo-labels. To ensure the quality of the pseudo-labels, we propose a calibration procedure aimed at closely aligning the pseudo-label estimates of individual samples with a desired ground truth distribution. Experiments show that our QAMatch significantly outperforms the recent similar-scale baselines and Large Language Models (LLMs) not only on our ScholarChemQA dataset but also on four benchmark datasets. We hope our benchmark and model can facilitate and promote more research on chemical QA.