AITopics

Country: North America > Canada (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Neural Information Processing SystemsJun-9-2026, 17:29:18 GMT

From Counterfactuals to Trees: Competitive Analysis of Model Extraction Attacks

The advent of Machine Learning as a Service (MLaaS) has heightened the trade-off between model explainability and security.

artificial intelligence, machine learning, proceedings, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsApr-30-2026, 02:54:14 GMT

e5440ffceaf4831b5f98652b8a27ffde-Paper-Conference.pdf

machine learning, natural language, target model, (19 more...)

Country:

Asia (0.46)
Europe (0.45)

Genre: Research Report (0.70)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Neural Information Processing SystemsMar-19-2026, 00:42:10 GMT

Beyond Slow Signs in High-fidelity Model Extraction

Deep neural networks, costly to train and rich in intellectual property value, areincreasingly threatened by model extraction attacks that compromise their confiden-tiality. Previous attacks have succeeded in reverse-engineering model parametersup to a precision of float64 for models trained on random data with at most threehidden layers using cryptanalytical techniques. However, the process was identifiedto be very time consuming and not feasible for larger and deeper models trained onstandard benchmarks. Our study evaluates the feasibility of parameter extractionmethods of Carlini et al. [1] further enhanced by Canales-Martínez et al. [2] formodels trained on standard benchmarks. We introduce a unified codebase thatintegrates previous methods and reveal that computational tools can significantlyinfluence performance. We develop further optimisations to the end-to-end attackand improve the efficiency of extracting weight signs by up to 14.8 times com-pared to former methods through the identification of easier and harder to extractneurons. Contrary to prior assumptions, we identify extraction of weights, notextraction of weight signs, as the critical bottleneck. With our improvements, a16,721 parameter model with 2 hidden layers trained on MNIST is extracted withinonly 98 minutes compared to at least 150 minutes previously. Finally, addressingmethodological deficiencies observed in previous studies, we propose new ways ofrobust benchmarking for future model extraction attacks.

artificial intelligence, machine learning, proceedings, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Neural Information Processing SystemsFeb-17-2026, 16:09:21 GMT

Marich: A Query-efficient Distributionally Equivalent Model Extraction Attack using Public Data

First, we define distributionally equivalent and Max-Information model extraction attacks, and reduce them into a variational optimisation problem.

machine learning, natural language, target model, (19 more...)

Country:

Asia > Singapore (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Germany > Bavaria > Regensburg (0.04)
(3 more...)

Genre: Research Report (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceOct-24-2025

SecureInfer: Heterogeneous TEE-GPU Architecture for Privacy-Critical Tensors for Large Language Model Deployment

Nayan, Tushar, Zhang, Ziqi, Sun, Ruimin

Abstract--With the increasing deployment of Large Language Models (LLMs) on mobile and edge platforms, securing them against model extraction attacks has become a pressing concern. However, protecting model privacy without sacrificing the performance benefits of untrusted AI accelerators, such as GPUs, presents a challenging trade-off. In this paper, we initiate the study of high-performance execution on LLMs and present SecureInfer, a hybrid framework that leverages a heterogeneous Trusted Execution Environments (TEEs)-GPU architecture to isolate privacy-critical components while offloading compute-intensive operations to untrusted accelerators. Building upon an outsourcing scheme, SecureInfer adopts an information-theoretic and threat-informed partitioning strategy: security-sensitive components, including non-linear layers, projection of attention head, FNN transformations, and LoRA adapters are executed inside an SGX enclave, while other linear operations (matrix multiplication) are performed on the GPU after encryption and are securely restored within the enclave. We implement a prototype of SecureInfer using the LLaMA-2 model and evaluate it across performance and security metrics. Our results show that SecureInfer offers strong security guarantees with reasonable performance, offering a practical solution for secure on-device model inference.

large language model, machine learning, natural language, (19 more...)

2510.19979

Country: North America > United States > Illinois (0.28)

Genre: Research Report > New Finding (0.86)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

arXiv.org Artificial IntelligenceJul-16-2025

Striking the Perfect Balance: Preserving Privacy While Boosting Utility in Collaborative Medical Prediction Platforms

Lin, Shao-Bo, Liu, Xiaotong, Wang, Yao

Online collaborative medical prediction platforms offer convenience and real-time feedback by leveraging massive electronic health records. However, growing concerns about privacy and low prediction quality can deter patient participation and doctor cooperation. In this paper, we first clarify the privacy attacks, namely attribute attacks targeting patients and model extraction attacks targeting doctors, and specify the corresponding privacy principles. We then propose a privacy-preserving mechanism and integrate it into a novel one-shot distributed learning framework, aiming to simultaneously meet both privacy requirements and prediction performance objectives. Within the framework of statistical learning theory, we theoretically demonstrate that the proposed distributed learning framework can achieve the optimal prediction performance under specific privacy requirements. We further validate the developed privacy-preserving collaborative medical prediction platform through both toy simulations and real-world data experiments.

artificial intelligence, data mining, machine learning, (15 more...)

2507.11187

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.86)
Banking & Finance > Trading > Prediction Market (0.81)
Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceJul-9-2025

A Survey on Model Extraction Attacks and Defenses for Large Language Models

Zhao, Kaixiang, Li, Lincan, Ding, Kaize, Gong, Neil Zhenqiang, Zhao, Yue, Dong, Yushun

Model extraction attacks pose significant security threats to deployed language models, potentially compromising intellectual property and user privacy. This survey provides a comprehensive taxonomy of LLM-specific extraction attacks and defenses, categorizing attacks into functionality extraction, training data extraction, and prompt-targeted attacks. We analyze various attack methodologies including API-based knowledge distillation, direct querying, parameter recovery, and prompt stealing techniques that exploit transformer architectures. We then examine defense mechanisms organized into model protection, data privacy protection, and prompt-targeted strategies, evaluating their effectiveness across different deployment scenarios. We propose specialized metrics for evaluating both attack effectiveness and defense performance, addressing the specific challenges of generative language models. Through our analysis, we identify critical limitations in current approaches and propose promising research directions, including integrated attack methodologies and adaptive defense mechanisms that balance security with model utility. This work serves NLP researchers, ML engineers, and security professionals seeking to protect language models in production environments.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

2506.22521

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

arXiv.org Artificial IntelligenceJun-4-2025

How Explanations Leak the Decision Logic: Stealing Graph Neural Networks via Explanation Alignment

Ma, Bin, Feng, Yuyuan, Lin, Minhua, Dai, Enyan

Graph Neural Networks (GNNs) have become essential tools for analyzing graph-structured data in domains such as drug discovery and financial analysis, leading to growing demands for model transparency. Recent advances in explainable GNNs have addressed this need by revealing important subgraphs that influence predictions, but these explanation mechanisms may inadvertently expose models to security risks. This paper investigates how such explanations potentially leak critical decision logic that can be exploited for model stealing. We propose {\method}, a novel stealing framework that integrates explanation alignment for capturing decision logic with guided data augmentation for efficient training under limited queries, enabling effective replication of both the predictive behavior and underlying reasoning patterns of target models. Experiments on molecular graph datasets demonstrate that our approach shows advantages over conventional methods in model stealing. This work highlights important security considerations for the deployment of explainable GNNs in sensitive domains and suggests the need for protective measures against explanation-based attacks. Our code is available at https://github.com/beanmah/EGSteal.

artificial intelligence, explanation, machine learning, (14 more...)

2506.03087

Country:

North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceJun-4-2025

MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models

Cheng, Xueqi, Zheng, Minxing, Zhu, Shixiang, Dong, Yushun

Model extraction attacks aim to replicate the functionality of a black-box model through query access, threatening the intellectual property (IP) of machine-learning-as-a-service (MLaaS) providers. Defending against such attacks is challenging, as it must balance efficiency, robustness, and utility preservation in the real-world scenario. Despite the recent advances, most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs. However, this assumption is increasingly unreliable, as modern models are trained on diverse datasets and attackers often operate under limited query budgets. As a result, the effectiveness of these defenses is significantly compromised in realistic deployment scenarios. To address this gap, we propose MISLEADER (enseMbles of dIStiLled modEls Against moDel ExtRaction), a novel defense strategy that does not rely on OOD assumptions. MISLEADER formulates model protection as a bilevel optimization problem that simultaneously preserves predictive fidelity on benign inputs and reduces extractability by potential clone models. Our framework combines data augmentation to simulate attacker queries with an ensemble of heterogeneous distilled models to improve robustness and diversity. We further provide a tractable approximation algorithm and derive theoretical error bounds to characterize defense effectiveness. Extensive experiments across various settings validate the utility-preserving and extraction-resistant properties of our proposed defense strategy. Our code is available at https://github.com/LabRAI/MISLEADER.

extraction, machine learning, natural language, (19 more...)

2506.02362

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)