Goto

Collaborating Authors

 Overview


Evolution Meets Diffusion: Efficient Neural Architecture Generation

arXiv.org Artificial Intelligence

Neural Architecture Search (NAS) has gained widespread attention for its transformative potential in deep learning model design. However, the vast and complex search space of NAS leads to significant computational and time costs. Neural Architecture Generation (NAG) addresses this by reframing NAS as a generation problem, enabling the precise generation of optimal architectures for specific tasks. Despite its promise, mainstream methods like diffusion models face limitations in global search capabilities and are still hindered by high computational and time demands. To overcome these challenges, we propose Evolutionary Diffusion-based Neural Architecture Generation (EDNAG), a novel approach that achieves efficient and training-free architecture generation. EDNAG leverages evolutionary algorithms to simulate the denoising process in diffusion models, using fitness to guide the transition from random Gaussian distributions to optimal architecture distributions. This approach combines the strengths of evolutionary strategies and diffusion models, enabling rapid and effective architecture generation. Extensive experiments demonstrate that EDNAG achieves state-of-the-art (SOTA) performance in architecture optimization, with an improvement in accuracy of up to 10.45%. Furthermore, it eliminates the need for time-consuming training and boosts inference speed by an average of 50 times, showcasing its exceptional efficiency and effectiveness.


Space Group Conditional Flow Matching

arXiv.org Artificial Intelligence

Inorganic crystals are periodic, highly-symmetric arrangements of atoms in three-dimensional space. Their structures are constrained by the symmetry operations of a crystallographic \emph{space group} and restricted to lie in specific affine subspaces known as \emph{Wyckoff positions}. The frequency an atom appears in the crystal and its rough positioning are determined by its Wyckoff position. Most generative models that predict atomic coordinates overlook these symmetry constraints, leading to unrealistically high populations of proposed crystals exhibiting limited symmetry. We introduce Space Group Conditional Flow Matching, a novel generative framework that samples significantly closer to the target population of highly-symmetric, stable crystals. We achieve this by conditioning the entire generation process on a given space group and set of Wyckoff positions; specifically, we define a conditionally symmetric noise base distribution and a group-conditioned, equivariant, parametric vector field that restricts the motion of atoms to their initial Wyckoff position. Our form of group-conditioned equivariance is achieved using an efficient reformulation of \emph{group averaging} tailored for symmetric crystals. Importantly, it reduces the computational overhead of symmetrization to a negligible level. We achieve state of the art results on crystal structure prediction and de novo generation benchmarks. We also perform relevant ablations.


Pure Node Selection for Imbalanced Graph Node Classification

arXiv.org Artificial Intelligence

The problem of class imbalance refers to an uneven distribution of quantity among classes in a dataset, where some classes are significantly underrepresented compared to others. Class imbalance is also prevalent in graph-structured data. Graph neural networks (GNNs) are typically based on the assumption of class balance, often overlooking the issue of class imbalance. In our investigation, we identified a problem, which we term the Randomness Anomalous Connectivity Problem (RACP), where certain off-the-shelf models are affected by random seeds, leading to a significant performance degradation. To eliminate the influence of random factors in algorithms, we proposed PNS (Pure Node Sampling) to address the RACP in the node synthesis stage. Unlike existing approaches that design specialized algorithms to handle either quantity imbalance or topological imbalance, PNS is a novel plug-and-play module that operates directly during node synthesis to mitigate RACP. Moreover, PNS also alleviates performance degradation caused by abnormal distribution of node neighbors. We conduct a series of experiments to identify what factors are influenced by random seeds. Experimental results demonstrate the effectiveness and stability of our method, which not only eliminates the effect of unfavorable random seeds but also outperforms the baseline across various benchmark datasets with different GNN backbones. Data and code are available at https://github.com/flzeng1/PNS.


From Static to Dynamic: a Survey of Topology-Aware Perception in Autonomous Driving

arXiv.org Artificial Intelligence

The key to achieving autonomous driving lies in topology-aware perception, the structured understanding of the driving environment with an emphasis on lane topology and road semantics. This survey systematically reviews four core research directions under this theme: vectorized map construction, topological structure modeling, prior knowledge fusion, and language model-based perception. Across these directions, we observe a unifying trend: a paradigm shift from static, pre-built maps to dynamic, sensor-driven perception. Specifically, traditional static maps have provided semantic context for autonomous systems. However, they are costly to construct, difficult to update in real time, and lack generalization across regions, limiting their scalability. In contrast, dynamic representations leverage on-board sensor data for real-time map construction and topology reasoning. Each of the four research directions contributes to this shift through compact spatial modeling, semantic relational reasoning, robust domain knowledge integration, and multimodal scene understanding powered by pre-trained language models. Together, they pave the way for more adaptive, scalable, and explainable autonomous driving systems.


Pancreas Part Segmentation under Federated Learning Paradigm

arXiv.org Artificial Intelligence

We present the first federated learning (FL) approach for pancreas part (head, body, tail) segmentation in MRI, addressing a critical clinical challenge as a significant innovation. Pancreatic diseases exhibit marked regional heterogeneity--cancers predominantly occur in the head region while chronic pancreatitis causes tissue loss in the tail--making accurate segmentation of the organ into head, body, and tail regions essential for precise diagnosis and treatment planning. This segmentation task remains exceptionally challenging in MRI due to variable morphology, poor soft-tissue contrast, and anatomical variations across patients. Our novel contribution tackles two fundamental challenges: first, the technical complexity of pancreas part delineation in MRI, and second the data scarcity problem that has hindered prior approaches. We introduce a privacy-preserving FL framework that enables collaborative model training across seven medical institutions without direct data sharing, leveraging a diverse dataset of 711 T1W and 726 T2W MRI scans. Our key innovations include: (1) a systematic evaluation of three state-of-the-art segmentation architectures (U-Net, Attention U-Net,Swin UNETR) paired with two FL algorithms (FedA vg, FedProx), revealing Attention U-Net with FedAvg as optimal for pancreatic heterogeneity, which was never been done before; (2) a novel anatomically-informed loss function prioritizing region-specific texture contrasts in MRI. Comprehensive evaluation demonstrates that our approach achieves clinically viable performance despite training on distributed, heterogeneous datasets.


Zero-shot Whole-Body Manipulation with a Large-Scale Soft Robotic Torso via Guided Reinforcement Learning

arXiv.org Artificial Intelligence

Whole-body manipulation is a powerful yet underexplored approach that enables robots to interact with large, heavy, or awkward objects using more than just their end-effectors. Soft robots, with their inherent passive compliance, are particularly well-suited for such contact-rich manipulation tasks, but their uncertainties in kinematics and dynamics pose significant challenges for simulation and control. In this work, we address this challenge with a simulation that can run up to 350x real time on a single thread in MuJoCo and provide a detailed analysis of the critical tradeoffs between speed and accuracy for this simulation. Using this framework, we demonstrate a successful zero-shot sim-to-real transfer of a learned whole-body manipulation policy, achieving an 88% success rate on the Baloo hardware platform. We show that guiding RL with a simple motion primitive is critical to this success where standard reward shaping methods struggled to produce a stable and successful policy for whole-body manipulation. Furthermore, our analysis reveals that the learned policy does not simply mimic the motion primitive. It exhibits beneficial reactive behavior, such as re-grasping and perturbation recovery. We analyze and contrast this learned policy against an open-loop baseline to show that the policy can also exhibit aggressive over-corrections under perturbation. To our knowledge, this is the first demonstration of forceful, six-DoF whole-body manipulation using two continuum soft arms on a large-scale platform (10 kg payloads), with zero-shot policy transfer.


Fusing Sequence Motifs and Pan-Genomic Features: Antimicrobial Resistance Prediction using an Explainable Lightweight 1D CNN-XGBoost Ensemble

arXiv.org Artificial Intelligence

Antimicrobial Resistance (AMR) is a rapidly escalating global health crisis. While genomic sequencing enables rapid prediction of resistance phenotypes, current computational methods have limitations. Standard machine learning models treat the genome as an unordered collection of features, ignoring the sequential context of Single Nucleotide Polymorphisms (SNPs). State-of-the-art sequence models like Transformers are often too data-hungry and computationally expensive for the moderately-sized datasets that are typical in this domain. To address these challenges, we propose AMR-EnsembleNet, an ensemble framework that synergistically combines sequence-based and feature-based learning. We developed a lightweight, custom 1D Convolutional Neural Network (CNN) to efficiently learn predictive sequence motifs from high-dimensional SNP data. This sequence-aware model was ensembled with an XGBoost model, a powerful gradient boosting system adept at capturing complex, non-local feature interactions. We trained and evaluated our framework on a benchmark dataset of 809 E. coli strains, predicting resistance across four antibiotics with varying class imbalance. Our 1D CNN-XGBoost ensemble consistently achieved top-tier performance across all the antibiotics, reaching a Matthews Correlation Coefficient (MCC) of 0.926 for Ciprofloxacin (CIP) and the highest Macro F1-score of 0.691 for the challenging Gentamicin (GEN) AMR prediction. We also show that our model consistently focuses on SNPs within well-known AMR genes like fusA and parC, confirming it learns the correct genetic signals for resistance. Our work demonstrates that fusing a sequence-aware 1D CNN with a feature-based XGBoost model creates a powerful ensemble, overcoming the limitations of using either an order-agnostic or a standalone sequence model.


Text-Based Approaches to Item Difficulty Modeling in Large-Scale Assessments: A Systematic Review

arXiv.org Artificial Intelligence

Item difficulty plays a crucial role in test performance, interpretability of scores, and equity for all test-takers, especially in large-scale assessments. Traditional approaches to item difficulty modeling rely on field testing and classical test theory (CTT)-based item analysis or item response theory (IRT) calibration, which can be time-consuming and costly. To overcome these challenges, text-based approaches leveraging machine learning and language models, have emerged as promising alternatives. This paper reviews and synthesizes 37 articles on automated item difficulty prediction in large-scale assessment settings published through May 2025. For each study, we delineate the dataset, difficulty parameter, subject domain, item type, number of items, training and test data split, input, features, model, evaluation criteria, and model performance outcomes. Results showed that although classic machine learning models remain relevant due to their interpretability, state-of-the-art language models, using both small and large transformer-based architectures, can capture syntactic and semantic patterns without the need for manual feature engineering. Uniquely, model performance outcomes were summarized to serve as a benchmark for future research and overall, text-based methods have the potential to predict item difficulty with root mean square error (RMSE) as low as 0.165, Pearson correlation as high as 0.87, and accuracy as high as 0.806. The review concludes by discussing implications for practice and outlining future research directions for automated item difficulty modeling.


How to Make Large Language Models Generate 100% Valid Molecules?

arXiv.org Artificial Intelligence

Molecule generation is key to drug discovery and materials science, enabling the design of novel compounds with specific properties. Large language models (LLMs) can learn to perform a wide range of tasks from just a few examples. However, generating valid molecules using representations like SMILES is challenging for LLMs in few-shot settings. In this work, we explore how LLMs can generate 100% valid molecules. We evaluate whether LLMs can use SELFIES, a representation where every string corresponds to a valid molecule, for valid molecule generation but find that LLMs perform worse with SELFIES than with SMILES. We then examine LLMs' ability to correct invalid SMILES and find their capacity limited. Finally, we introduce SmiSelf, a cross-chemical language framework for invalid SMILES correction. SmiSelf converts invalid SMILES to SELFIES using grammatical rules, leveraging SELFIES' mechanisms to correct the invalid SMILES. Experiments show that SmiSelf ensures 100% validity while preserving molecular characteristics and maintaining or even enhancing performance on other metrics. SmiSelf helps expand LLMs' practical applications in biomedicine and is compatible with all SMILES-based generative models. Code is available at https://github.com/wentao228/SmiSelf.


From Noise to Knowledge: A Comparative Study of Acoustic Anomaly Detection Models in Pumped-storage Hydropower Plants

arXiv.org Artificial Intelligence

In the context of industrial factories and energy producers, unplanned outages are highly costly and difficult to service. However, existing acoustic-anomaly detection studies largely rely on generic industrial or synthetic datasets, with few focused on hydropower plants due to limited access. This paper presents a comparative analysis of acoustic-based anomaly detection methods, as a way to improve predictive maintenance in hydropower plants. We address key challenges in the acoustic preprocessing under highly noisy conditions before extracting time- and frequency-domain features. Then, we benchmark three machine learning models: LSTM AE, K-Means, and OC-SVM, which are tested on two real-world datasets from the Rodundwerk II pumped-storage plant in Austria, one with induced anomalies and one with real-world conditions. The One-Class SVM achieved the best trade-off of accuracy (ROC AUC 0.966-0.998) and minimal training time, while the LSTM autoencoder delivered strong detection (ROC AUC 0.889-0.997) at the expense of higher computational cost.