Xie, Linhai
Automating Exploratory Proteomics Research via Language Models
Ding, Ning, Qu, Shang, Xie, Linhai, Li, Yifei, Liu, Zaoqu, Zhang, Kaiyan, Xiong, Yibai, Zuo, Yuxin, Chen, Zhangren, Hua, Ermo, Lv, Xingtai, Sun, Youbang, Li, Yang, Li, Dong, He, Fuchu, Zhou, Bowen
With the development of artificial intelligence, its contribution to science is evolving from simulating a complex problem to automating entire research processes and producing novel discoveries. Achieving this advancement requires both specialized general models grounded in real-world scientific data and iterative, exploratory frameworks that mirror human scientific methodologies. In this paper, we present PROTEUS, a fully automated system for scientific discovery from raw proteomics data. PROTEUS uses large language models (LLMs) to perform hierarchical planning, execute specialized bioinformatics tools, and iteratively refine analysis workflows to generate high-quality scientific hypotheses. The system takes proteomics datasets as input and produces a comprehensive set of research objectives, analysis results, and novel biological hypotheses without human intervention. We evaluated PROTEUS on 12 proteomics datasets collected from various biological samples (e.g. immune cells, tumors) and different sample types (single-cell and bulk), generating 191 scientific hypotheses. These were assessed using both automatic LLM-based scoring on 5 metrics and detailed reviews from human experts. Results demonstrate that PROTEUS consistently produces reliable, logically coherent results that align well with existing literature while also proposing novel, evaluable hypotheses. The system's flexible architecture facilitates seamless integration of diverse analysis tools and adaptation to different proteomics data types. By automating complex proteomics analysis workflows and hypothesis generation, PROTEUS has the potential to considerably accelerate the pace of scientific discovery in proteomics research, enabling researchers to efficiently explore large-scale datasets and uncover biological insights.
Differentiable architecture search with multi-dimensional attention for spiking neural networks
Man, Yilei, Xie, Linhai, Qiao, Shushan, Zhou, Yumei, Shang, Delong
Spiking Neural Networks (SNNs) have gained enormous popularity in the field of artificial intelligence due to their low power consumption. However, the majority of SNN methods directly inherit the structure of Artificial Neural Networks (ANN), usually leading to sub-optimal model performance in SNNs. To alleviate this problem, we integrate Neural Architecture Search (NAS) method and propose Multi-Attention Differentiable Architecture Search (MA-DARTS) to directly automate the search for the optimal network structure of SNNs. Initially, we defined a differentiable two-level search space and conducted experiments within micro architecture under a fixed layer. Then, we incorporated a multi-dimensional attention mechanism and implemented the MA-DARTS algorithm in this search space. Comprehensive experiments demonstrate our model achieves state-of-the-art performance on classification compared to other methods under the same parameters with 94.40% accuracy on CIFAR10 dataset and 76.52% accuracy on CIFAR100 dataset. Additionally, we monitored and assessed the number of spikes (NoS) in each cell during the whole experiment. Notably, the number of spikes of the whole model stabilized at approximately 110K in validation and 100k in training on datasets.
Learning Semantic Segmentation of Large-Scale Point Clouds with Random Sampling
Hu, Qingyong, Yang, Bo, Xie, Linhai, Rosa, Stefano, Guo, Yulan, Wang, Zhihua, Trigoni, Niki, Markham, Andrew
Abstract--We study the problem of efficient semantic segmentation of large-scale 3D point clouds. By relying on expensive sampling techniques or computationally heavy pre/post-processing steps, most existing approaches are only able to be trained and operate over small-scale point clouds. In this paper, we introduce RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Although remarkably computation and memory efficient, random sampling can discard key features by chance. To overcome this, we introduce a novel local feature aggregation module to progressively increase the receptive field for each 3D point, thereby effectively preserving geometric details. Comparative experiments show that our RandLA-Net can process 1 million points in a single pass up to 200 faster than existing approaches. Moreover, extensive experiments on five large-scale point cloud datasets, including Semantic3D, SemanticKITTI, Toronto3D, NPM3D and S3DIS, demonstrate the state-of-the-art semantic segmentation performance of our RandLA-Net. A key challenge is that the raw point clouds acquired by depth sensors are typically irregularly sampled, unstructured and unordered. Recently, the pioneering work PointNet [4] has emerged as a promising approach for directly processing 3D point clouds. It learns per-point features using shared multilayer perceptrons (MLPs). This is computationally efficient but fails to capture wider context information for each point.
Neural Allocentric Intuitive Physics Prediction from Real Videos
Wang, Zhihua, Rosa, Stefano, Miao, Yishu, Lai, Zihang, Xie, Linhai, Markham, Andrew, Trigoni, Niki
Humans are able to make rich predictions about the future dynamics of physical objects from a glance. On the other hand, most existing computer vision approaches require strong assumptions about the underlying system, ad-hoc modeling, or annotated datasets, to carry out even simple predictions. To tackle this gap, we propose a new perspective on the problem of learning intuitive physics that is inspired by the spatial memory representation of objects and spaces in human brains, in particular the co-existence of egocentric and allocentric spatial representations. We present a generic framework that learns a layered representation of the physical world, using a cascade of invertible modules. In this framework, real images are first converted to a synthetic domain representation that reduces complexity arising from lighting and texture. Then, an allocentric viewpoint transformer removes viewpoint complexity by projecting images to a canonical view. Finally, a novel Recurrent Latent Variation Network (RLVN) architecture learns the dynamics of the objects interacting with the environment and predicts future motion, leveraging the availability of unlimited synthetic simulations. Predicted frames are then projected back to the original camera view and translated back to the real world domain. Experimental results show the ability of the framework to consistently and accurately predict several frames in the future and the ability to adapt to real images.