AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsNov-20-2025, 23:02:03 GMT

Do Less, Get More: Streaming Submodular Maximization with Subsampling

In this paper, we develop the first one-pass streaming algorithm for submodular maximization that does not evaluate the entire stream even once. By carefully subsampling each element of the data stream, our algorithm enjoys the tightest approximation guarantees in various settings while having the smallest memory footprint and requiring the lowest number of function evaluations. More specifically, for a monotone submodular function and a $p$-matchoid constraint, our randomized algorithm achieves a $4p$ approximation ratio (in expectation) with $O(k)$ memory and $O(km/p)$ queries per element ($k$ is the size of the largest feasible solution and $m$ is the number of matroids used to define the constraint).

algorithm, name change, streaming submodular maximization, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.36)

Shankaranarayana, Sharath M, Roy, Soumava Kumar, Sudhakar, Prasad, Aladahalli, Chandan

Deep Attention-guided Adaptive Subsampling

arXiv.org Artificial IntelligenceOct-15-2025

Although deep neural networks have provided impressive gains in performance, these improvements often come at the cost of increased computational complexity and expense. In many cases, such as 3D volume or video classification tasks, not all slices or frames are necessary due to inherent redundancies. To address this issue, we propose a novel learnable subsampling framework that can be integrated into any neural network architecture. Subsampling, being a nondifferentiable operation, poses significant challenges for direct adaptation into deep learning models. While some works, have proposed solutions using the Gumbel-max trick to overcome the problem of non-differentiability, they fall short in a crucial aspect: they are only task-adaptive and not inputadaptive. Once the sampling mechanism is learned, it remains static and does not adjust to different inputs, making it unsuitable for real-world applications. To this end, we propose an attention-guided sampling module that adapts to inputs even during inference. This dynamic adaptation results in performance gains and reduces complexity in deep neural network models. We demonstrate the effectiveness of our method on 3D medical imaging datasets from MedMNIST3D as well as two ultrasound video datasets for classification tasks, one of them being a challenging in-house dataset collected under real-world clinical conditions.

artificial intelligence, deep learning, machine learning, (18 more...)

2510.12376

Country: Asia > India (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsMay-26-2025, 23:13:29 GMT

Practical Differentially Private Hyperparameter Tuning with Subsampling

Tuning the hyperparameters of differentially private (DP) machine learning (ML) algorithms often requires use of sensitive data and this may leak private information via hyperparameter values. Recently, Papernot and Steinke (2022) proposed a certain class of DP hyperparameter tuning algorithms, where the number of random search samples is randomized. Commonly, these algorithms still considerably increase the DP privacy parameter \varepsilon over non-tuned DP ML model training and can be computationally heavy as evaluating each hyperparameter candidate requires a new training run. We focus on lowering both the DP bounds and the compute cost of these methods by using only a random subset of the sensitive data for the hyperparameter tuning and by appropriately extrapolating the optimal values to a larger dataset. We carry out a Rényi differential privacy analysis for the proposed method and experimentally show that it consistently leads to better privacy-utility trade-off than the baseline method by Papernot and Steinke.

papernot and steinke, practical differentially private hyperparameter tuning, subsampling

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.84)

Dong, Andy, Chen, Wei-Ning, Ozgur, Ayfer

Leveraging Randomness in Model and Data Partitioning for Privacy Amplification

arXiv.org Artificial IntelligenceMar-4-2025

We study how inherent randomness in the training process -- where each sample (or client in federated learning) contributes only to a randomly selected portion of training -- can be leveraged for privacy amplification. This includes (1) data partitioning, where a sample participates in only a subset of training iterations, and (2) model partitioning, where a sample updates only a subset of the model parameters. We apply our framework to model parallelism in federated learning, where each client updates a randomly selected subnetwork to reduce memory and computational overhead, and show that existing methods, e.g. model splitting or dropout, provide a significant privacy amplification gain not captured by previous privacy analysis techniques. Additionally, we introduce Balanced Iteration Subsampling, a new data partitioning method where each sample (or client) participates in a fixed number of training iterations. We show that this method yields stronger privacy amplification than Poisson (i.i.d.) sampling of data (or clients). Our results demonstrate that randomness in the training process, which is structured rather than i.i.d. and interacts with data in complex ways, can be systematically leveraged for significant privacy amplification.

iteration, model and data partitioning, theorem 3, (11 more...)

2503.03043

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsJan-18-2025, 13:09:49 GMT

Practical Differentially Private Hyperparameter Tuning with Subsampling

papernot and steinke, practical differentially private hyperparameter tuning, subsampling

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.84)

Mittal, Prateek, Dalmotra, Jai, Chauhan, Joohi

Adaptive and Stratified Subsampling Techniques for High Dimensional Non-Standard Data Environments

arXiv.org Artificial IntelligenceOct-16-2024

In the era of big data, researchers and practitioners across various domains are grappling with datasets of unprecedented scale and complexity. These high-dimensional datasets, characterized by a large number of features relative to the sample size, pose significant challenges to traditional statistical methods. Simultaneously, the increasing prevalence of non-standard data environments, such as those with heavy-tailed distributions or complex dependence structures, further complicates the landscape of data analysis. Subsampling techniques have emerged as a promising approach to address the computational challenges associated with large-scale data analysis. By working with a carefully chosen subset of the data, these methods aim to achieve a balance between statistical accuracy and computational efficiency. However, the theoretical foundations of subsampling in high-dimensional, nonstandard environments remain inadequately explored, leaving a critical gap in our understanding of their statistical properties and practical applicability.

artificial intelligence, data mining, machine learning, (18 more...)

2410.12367

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

arXiv.org Machine LearningSep-6-2024

Notes on Sampled Gaussian Mechanism

Kalinin, Nikita P.

In these notes, we prove a recent conjecture posed in the paper by R\"ais\"a, O. et al. [Subsampling is not Magic: Why Large Batch Sizes Work for Differentially Private Stochastic Optimization (2024)]. Theorem 6.2 of the paper asserts that for the Sampled Gaussian Mechanism - a composition of subsampling and additive Gaussian noise, the effective noise level, $\sigma_{\text{eff}} = \frac{\sigma(q)}{q}$, decreases as a function of the subsampling rate $q$. Consequently, larger subsampling rates are preferred for better privacy-utility trade-offs. Our notes provide a rigorous proof of Conjecture 6.3, which was left unresolved in the original paper, thereby completing the proof of Theorem 6.2.

conjecture, sampled gaussian mechanism, theorem 6, (12 more...)

arXiv.org Machine Learning

2409.04636

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Peck, Jonathan, Goossens, Bart

Robust width: A lightweight and certifiable adversarial defense

arXiv.org Artificial IntelligenceMay-24-2024

Deep neural networks are vulnerable to so-called adversarial examples: inputs which are intentionally constructed to cause the model to make incorrect predictions or classifications. Adversarial examples are often visually indistinguishable from natural data samples, making them hard to detect. As such, they pose significant threats to the reliability of deep learning systems. In this work, we study an adversarial defense based on the robust width property (RWP), which was recently introduced for compressed sensing. We show that a specific input purification scheme based on the RWP gives theoretical robustness guarantees for images that are approximately sparse. The defense is easy to implement and can be applied to any existing model without additional training or finetuning. We empirically validate the defense on ImageNet against $L^\infty$ perturbations at perturbation budgets ranging from $4/255$ to $32/255$. In the black-box setting, our method significantly outperforms the state-of-the-art, especially for large perturbations. In the white-box setting, depending on the choice of base classifier, we closely match the state of the art in robust ImageNet classification while avoiding the need for additional data, larger models or expensive adversarial training routines. Our code is available at https://github.com/peck94/robust-width-defense.

classifier, perturbation, robustness, (15 more...)

2405.15971

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation (0.49)
Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Schuchardt, Jan, Stoian, Mihail, Kosmala, Arthur, Günnemann, Stephan

Group Privacy Amplification and Unified Amplification by Subsampling for R\'enyi Differential Privacy

arXiv.org Machine LearningMar-7-2024

Differential privacy (DP) has various desirable properties, such as robustness to post-processing, group privacy, and amplification by subsampling, which can be derived independently of each other. Our goal is to determine whether stronger privacy guarantees can be obtained by considering multiple of these properties jointly. To this end, we focus on the combination of group privacy and amplification by subsampling. To provide guarantees that are amenable to machine learning algorithms, we conduct our analysis in the framework of R\'enyi-DP, which has more favorable composition properties than $(\epsilon,\delta)$-DP. As part of this analysis, we develop a unified framework for deriving amplification by subsampling guarantees for R\'enyi-DP, which represents the first such framework for a privacy accounting method and is of independent interest. We find that it not only lets us improve upon and generalize existing amplification results for R\'enyi-DP, but also derive provably tight group privacy amplification guarantees stronger than existing principles. These results establish the joint study of different DP properties as a promising research direction.

group privacy amplification, privacy amplification and unified amplification, subsampling, (11 more...)

arXiv.org Machine Learning

2403.04867

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)