AITopics

Appendix

Neural Information Processing SystemsMar-21-2025, 09:46:49 GMT

The following lemma demonstrates the convergence property of the SGD framework when the gradient estimator v(x) is unbiased and has bounded variance. Differently, these papers assumed that the norm of the gradient estimator, i.e., v(x) One can also refer to the recent summary [6] for more general results on SGD. When the variance of v(x) is of order O(ϵ), one can use stepsizes that are independent of ϵ to guarantee ϵ-optimality or ϵ-stationarity. The algorithm would behave similar like gradient descent. Suppose that it holds true for t. ( We prove the case when F (x) is convex. This section demonstrates the bias, variance, and per-iteration cost of the L-SGD and the MLMCbased gradient estimators.

artificial intelligence, assumption 2, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

b986700c627db479a4d9460b75de7222-Paper.pdf

Neural Information Processing SystemsMar-21-2025, 09:46:46 GMT

artificial intelligence, estimator, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

HYDRA: Pruning Adversarially Robust Neural Networks

Neural Information Processing SystemsMar-21-2025, 09:46:31 GMT

In safety-critical but computationally resource-constrained applications, deep learning faces two key challenges: lack of robustness against adversarial attacks and large neural network size (often millions of parameters). While the research community has extensively explored the use of robust training and network pruning independently to address one of these challenges, only a few recent works have studied them jointly. However, these works inherit a heuristic pruning strategy that was developed for benign training, which performs poorly when integrated with robust training techniques, including adversarial training and verifiable robust training. To overcome this challenge, we propose to make pruning techniques aware of the robust training objective and let the training objective guide the search for which connections to prune. We realize this insight by formulating the pruning objective as an empirical risk minimization problem which is solved efficiently using SGD.

artificial intelligence, machine learning, pruning, (17 more...)

Neural Information Processing Systems

Country: North America (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Government > Military (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

3b3a83a5d86e1d424daefed43d998079-Supplemental-Conference.pdf

Neural Information Processing SystemsMar-21-2025, 09:46:18 GMT

artificial intelligence, log 2, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

Asymptotically Optimal Quantile Pure Exploration for Infinite-Armed Bandits

Neural Information Processing SystemsMar-21-2025, 09:46:14 GMT

We study pure exploration with infinitely many bandit arms generated i.i.d.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Genre:

Research Report (0.67)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

e3a54649aeec04cf1c13907bc6c5c8aa-Paper.pdf

Neural Information Processing SystemsMar-21-2025, 09:46:00 GMT

artificial intelligence, exponential family, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.67)

Genre: Research Report (0.47)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

e3a54649aeec04cf1c13907bc6c5c8aa-AuthorFeedback.pdf

Neural Information Processing SystemsMar-21-2025, 09:45:48 GMT

BBBVI took about 3 hours per dataset. The NOMT took less than 5 seconds per dataset. Reviewer 3 noted that the spike-and-slab model does not satisfy the non-overlapping support assumption of Theorem 1. Reviewer 2 pointed out that there is an interesting asymmetry in Theorem 1 with respect to component K. It would be possible to have a "symmetric" version of the theorem, but it would describe a Reviewer 2 suggested using reconstruction error as a metric for the sparse PCA application. I will include a discussion of these similarities and differences in the revision. Reviewer 1 asked if the supports of the mixture distributions must defined a priori.

artificial intelligence, exponential family, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

Add feedback

Fast Iterative Hard Thresholding Methods with Pruning Gradient Computations Yasutoshi Ida 1

Neural Information Processing SystemsMar-21-2025, 09:45:34 GMT

We accelerate the iterative hard thresholding (IHT) method, which finds k important elements from a parameter vector in a linear regression model. Although the plain IHT repeatedly updates the parameter vector during the optimization, computing gradients is the main bottleneck. Our method safely prunes unnecessary gradient computations to reduce the processing time. The main idea is to efficiently construct a candidate set, which contains k important elements in the parameter vector, for each iteration. Specifically, before computing the gradients, we prune unnecessary elements in the parameter vector for the candidate set by utilizing upper bounds on absolute values of the parameters. Our method guarantees the same optimization results as the plain IHT because our pruning is safe. Experiments show that our method is up to 73 times faster than the plain IHT without degrading accuracy.

artificial intelligence, machine learning, parameter vector, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Add feedback

494f876fad056843f310ad647274dd99-Supplemental-Conference.pdf

Neural Information Processing SystemsMar-21-2025, 09:45:28 GMT

artificial intelligence, machine learning, representation, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Improving Self-Supervised Learning by Characterizing Idealized Representations

Neural Information Processing SystemsMar-21-2025, 09:45:24 GMT

Despite the empirical successes of self-supervised learning (SSL) methods, it is unclear what characteristics of their representations lead to high downstream accuracies. In this work, we characterize properties that SSL representations should ideally satisfy. Specifically, we prove necessary and sufficient conditions such that for any task invariant to given data augmentations, desired probes (e.g., linear or MLP) trained on that representation attain perfect accuracy. These requirements lead to a unifying conceptual framework for improving existing SSL methods and deriving new ones. For contrastive learning, our framework prescribes simple but significant improvements to previous methods such as using asymmetric projection heads. For non-contrastive learning, we use our framework to derive a simple and novel objective. Our resulting SSL algorithms outperform baselines on standard benchmarks, including SwAV+multicrops on linear probing of ImageNet.

artificial intelligence, machine learning, representation, (15 more...)

Neural Information Processing Systems

Country: North America (0.46)

Genre: Research Report > New Finding (0.46)

Technology: