AITopics

2503.1696

Country: Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.15)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.93)

arXiv.org Artificial IntelligenceMar-10-2025

Efficient Membership Inference Attacks by Bayesian Neural Network

Liu, Zhenlong, Jiang, Wenyu, Zhou, Feng, Wei, Hongxin

Membership Inference Attacks (MIAs) aim to estimate whether a specific data point was used in the training of a given model. Previous attacks often utilize multiple reference models to approximate the conditional score distribution, leading to significant computational overhead. While recent work leverages quantile regression to estimate conditional thresholds, it fails to capture epistemic uncertainty, resulting in bias in low-density regions. In this work, we propose a novel approach - Bayesian Membership Inference Attack (BMIA), which performs conditional attack through Bayesian inference. In particular, we transform a trained reference model into Bayesian neural networks by Laplace approximation, enabling the direct estimation of the conditional score distribution by probabilistic model parameters. Our method addresses both epistemic and aleatoric uncertainty with only a reference model, enabling efficient and powerful MIA. Extensive experiments on five datasets demonstrate the effectiveness and efficiency of BMIA.

artificial intelligence, bayesian inference, machine learning, (11 more...)

2503.07482

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Information Technology > Security & Privacy (0.93)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceMar-5-2025

Integrating Protein Dynamics into Structure-Based Drug Design via Full-Atom Stochastic Flows

Zhou, Xiangxin, Xiao, Yi, Lin, Haowei, He, Xinheng, Guan, Jiaqi, Wang, Yang, Liu, Qiang, Zhou, Feng, Wang, Liang, Ma, Jianzhu

The dynamic nature of proteins, influenced by ligand interactions, is essential for comprehending protein function and progressing drug discovery. Traditional structure-based drug design (SBDD) approaches typically target binding sites with rigid structures, limiting their practical application in drug development. While molecular dynamics simulation can theoretically capture all the biologically relevant conformations, the transition rate is dictated by the intrinsic energy barrier between them, making the sampling process computationally expensive. To overcome the aforementioned challenges, we propose to use generative modeling for SBDD considering conformational changes of protein pockets. We curate a dataset of apo and multiple holo states of protein-ligand complexes, simulated by molecular dynamics, and propose a full-atom flow model (and a stochastic version), named DynamicFlow, that learns to transform apo pockets and noisy ligands into holo pockets and corresponding 3D ligand molecules. Additionally, the resultant holo-like states provide superior inputs for traditional SBDD approaches, playing a significant role in practical drug discovery. Modern deep learning is advancing several areas within drug discovery. Notably, among these, structure-based drug design (SBDD) (Anderson, 2003) emerges as a particularly significant and challenging domain. SBDD aims to discover drug-like ligand molecules specifically tailored to target binding sites. However, the complexity of chemical space and the dynamic nature of molecule conformations make traditional methods such as high throughput and virtual screenings inefficient. Additionally, relying on compound databases limits the diversity of identified molecules. Thus, deep generative models, such as autoregressive models (Luo et al., 2021; Peng et al., 2022) and diffusion models (Guan et al., 2023; Schneuing et al., 2022), have been introduced as a tool for de novo 3D ligand molecule design based on binding pockets, significantly transforming research paradigms. However, most SBDD methods based on deep generative models assume that proteins are rigid (Peng et al., 2022; Guan et al., 2024). However, the dynamic behavior of proteins is crucial for practical drug discovery (Karelina et al., 2023; Boehr et al., 2009). Thermodynamic fluctuations result in proteins existing as an ensemble of various conformational states, and such states may interact with different drug molecules. During binding, the protein's structure may undergo fine-tuning, adopting different conformations to optimize its interaction with the drug, a phenomenon referred to as induced fit (Sherman et al., 2006).

artificial intelligence, machine learning, molecule, (17 more...)

2503.03989

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.44)

arXiv.org Machine LearningFeb-26-2025

Enhancing Gradient-based Discrete Sampling via Parallel Tempering

Liang, Luxu, Jia, Yuhang, Zhou, Feng

While gradient-based discrete samplers are effective in sampling from complex distributions, they are susceptible to getting trapped in local minima, particularly in high-dimensional, multimodal discrete distributions, owing to the discontinuities inherent in these landscapes. To circumvent this issue, we combine parallel tempering, also known as replica exchange, with the discrete Langevin proposal and develop the Parallel Tempering enhanced Discrete Langevin Proposal (PTDLP), which are simulated at a series of temperatures. Significant energy differences prompt sample swaps, which are governed by a Metropolis criterion specifically designed for discrete sampling to ensure detailed balance is maintained. Additionally, we introduce an automatic scheme to determine the optimal temperature schedule and the number of chains, ensuring adaptability across diverse tasks with minimal tuning. Theoretically, we establish that our algorithm converges non-asymptotically to the target energy and exhibits faster mixing compared to a single chain. Empirical results further emphasize the superiority of our method in sampling from complex, multimodal discrete distributions, including synthetic problems, restricted Boltzmann machines, and deep energy-based models.

artificial intelligence, enhancing gradient-based discrete sampling, machine learning, (15 more...)

2502.1924

Country:

Asia (0.14)
North America (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

arXiv.org Artificial IntelligenceFeb-10-2025

Language-TPP: Integrating Temporal Point Processes with Language Models for Event Analysis

Kong, Quyu, Zhang, Yixuan, Liu, Yang, Tong, Panrong, Liu, Enqi, Zhou, Feng

Temporal Point Processes (TPPs) have been widely used for event sequence modeling, but they often struggle to incorporate rich textual event descriptions effectively. Conversely, while Large Language Models (LLMs) have been shown remarkable capabilities in processing textual data, they lack mechanisms for handling temporal dynamics. To bridge this gap, we introduce Language-TPP, a unified framework that integrates TPPs with LLMs for enhanced event sequence modeling. Language-TPP introduces a novel temporal encoding mechanism that converts continuous time intervals into specialized byte-tokens, enabling seamless integration with standard LLM architectures. This approach allows Language-TPP to achieve state-of-the-art performance across multiple TPP tasks, including event time prediction, type prediction, and intensity estimation, on five datasets. Additionally, we demonstrate that incorporating temporal information significantly improves the quality of generated event descriptions.

large language model, machine learning, natural language, (17 more...)

2502.07139

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningJan-24-2025

Advances in Temporal Point Processes: Bayesian, Deep, and LLM Approaches

Zhou, Feng, Kong, Quyu, Zhang, Yixuan

Temporal point processes (TPPs) are stochastic process models used to characterize event sequences occurring in continuous time. Traditional statistical TPPs have a long-standing history, with numerous models proposed and successfully applied across diverse domains. In recent years, advances in deep learning have spurred the development of neural TPPs, enabling greater flexibility and expressiveness in capturing complex temporal dynamics. The emergence of large language models (LLMs) has further sparked excitement, offering new possibilities for modeling and analyzing event sequences by leveraging their rich contextual understanding. This survey presents a comprehensive review of recent research on TPPs from three perspectives: Bayesian, deep learning, and LLM approaches. We begin with a review of the fundamental concepts of TPPs, followed by an in-depth discussion of model design and parameter estimation techniques in these three frameworks. We also revisit classic application areas of TPPs to highlight their practical relevance. Finally, we outline challenges and promising directions for future research.

large language model, machine learning, natural language, (20 more...)

2501.14291

Genre:

Research Report (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

arXiv.org Artificial IntelligenceJan-6-2025

Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild

Hu, Wanpeng, Liu, Haodi, Chen, Lin, Zhou, Feng, Xiao, Changming, Yang, Qi, Zhang, Changshui

Complex visual reasoning remains a key challenge today. Typically, the challenge is tackled using methodologies such as Chain of Thought (COT) and visual instruction tuning. However, how to organically combine these two methodologies for greater success remains unexplored. Also, issues like hallucinations and high training cost still need to be addressed. In this work, we devise an innovative multi-round training and reasoning framework suitable for lightweight Multimodal Large Language Models (MLLMs). Our self-questioning approach heuristically guides MLLMs to focus on visual clues relevant to the target problem, reducing hallucinations and enhancing the model's ability to describe fine-grained image details. This ultimately enables the model to perform well in complex visual reasoning and question-answering tasks. We have named this framework Socratic Questioning(SQ). To facilitate future research, we create a multimodal mini-dataset named CapQA, which includes 1k images of fine-grained activities, for visual instruction tuning and evaluation, our proposed SQ method leads to a 31.2% improvement in the hallucination score. Our extensive experiments on various benchmarks demonstrate SQ's remarkable capabilities in heuristic self-questioning, zero-shot visual reasoning and hallucination mitigation. Our model and code will be publicly available.

arxiv preprint, large language model, machine learning, (19 more...)

2501.02964

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceDec-29-2024

Align Attention Heads Before Merging Them: An Effective Way for Converting MHA to GQA

Jin, Qingyun, Song, Xiaohui, Zhou, Feng, Qin, Zengchang

Large language models have been shown to perform well on a variety of natural language processing problems. However, as the model size and the input sequence's length increase, the rapid increase of KV Cache significantly slows down inference speed. Therefore GQA model, as an alternative to MHA model, has been widely introduced into LLMs. In this work, we propose a low-cost method for pruning MHA models into GQA models with any compression ratio of key-value heads. Our method is based on $\mathit{L_0}$ masks to gradually remove redundant parameters. In addition, we apply orthogonal transformations to attention heads without changing the model to increase similarity between attention heads before pruning training, in order to further improve performance of the model. Our method can be compatible with rotary position embedding (RoPE), which means the model after training can be fully adapted to the mainstream standard GQA framework. Experiments demonstrate that our strategy can compress up to 87.5% of key-value heads of the LLaMA2-7B model without too much performance degradation, just achieved through supervised fine-tuning.

artificial intelligence, large language model, natural language, (15 more...)

2412.20677

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Machine LearningDec-25-2024

Task Diversity in Bayesian Federated Learning: Simultaneous Processing of Classification and Regression

Lyu, Junliang, Zhang, Yixuan, Lu, Xiaoling, Zhou, Feng

This work addresses a key limitation in current federated learning approaches, which predominantly focus on homogeneous tasks, neglecting the task diversity on local devices. We propose a principled integration of multi-task learning using multi-output Gaussian processes (MOGP) at the local level and federated learning at the global level. MOGP handles correlated classification and regression tasks, offering a Bayesian non-parametric approach that naturally quantifies uncertainty. The central server aggregates the posteriors from local devices, updating a global MOGP prior redistributed for training local models until convergence. Challenges in performing posterior inference on local devices are addressed through the P\'{o}lya-Gamma augmentation technique and mean-field variational inference, enhancing computational efficiency and convergence rate. Experimental results on both synthetic and real data demonstrate superior predictive performance, OOD detection, uncertainty calibration and convergence rate, highlighting the method's potential in diverse applications. Our code is publicly available at https://github.com/JunliangLv/task_diversity_BFL.

artificial intelligence, bayesian inference, machine learning, (16 more...)

2412.10897

Country:

Asia > China (0.47)
North America > United States (0.46)
North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology (0.68)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

arXiv.org Machine LearningDec-18-2024

Nonstationary Sparse Spectral Permanental Process

Sun, Zicheng, Zhang, Yixuan, Ling, Zenan, Fan, Xuhui, Zhou, Feng

Existing permanental processes often impose constraints on kernel types or stationarity, limiting the model's expressiveness. To overcome these limitations, we propose a novel approach utilizing the sparse spectral representation of nonstationary kernels. This technique relaxes the constraints on kernel types and stationarity, allowing for more flexible modeling while reducing computational complexity to the linear level. Additionally, we introduce a deep kernel variant by hierarchically stacking multiple spectral feature mappings, further enhancing the model's expressiveness to capture complex patterns in data. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of our approach, particularly in scenarios with pronounced data nonstationarity. Additionally, ablation studies are conducted to provide insights into the impact of various hyperparameters on model performance.

artificial intelligence, machine learning, pattern recognition, (18 more...)

2410.03581

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.34)