AITopics | Optimization

f5ccb3ab757131a93586ef61ec701533-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 08:09:14 GMT

In this section, we compare the symmetric solutions found in erf [2] and ReLU networks [5] to our one-neuron solution (n =1). The main difference is that both earlier studies constrain the search space to the symmetric subspace whereas we first prove that the non-trivial critical points are contained in this subspace in Theorem 5.1 for a broad class of activation functions, including erf and ReLU. Solving the low-dimensional loss, we recover the same solution for ReLU and erf as in [2, 5] for unit-orthonormal teachers.

artificial intelligence, critical point, machine learning, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.37)

Add feedback

SMF_NeurIPS_2023-6

Hanbaek Lyu

Neural Information Processing SystemsApr-30-2026, 07:09:30 GMT

artificial intelligence, bioinformatics, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin (0.28)
North America > United States > California (0.28)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science (0.95)
Information Technology > Biomedical Informatics > Translational Bioinformatics (0.94)
(2 more...)

Add feedback

Knowledge Distillation Performs Partial Variance Reduction

Neural Information Processing SystemsApr-30-2026, 05:37:23 GMT

Knowledge distillation is a popular approach for enhancing the performance of "student" models, with lower representational capacity, by taking advantage of more powerful "teacher" models. Despite its apparent simplicity and widespread use, the underlying mechanics behind knowledge distillation (KD) are still not fully understood. In this work, we shed new light on the inner workings of this method, by examining it from an optimization perspective. We show that, in the context of linear and deep linear models, KD can be interpreted as a novel type of stochastic variance reduction mechanism. We provide a detailed convergence analysis of the resulting dynamics, which hold under standard assumptions for both strongly-convex and non-convex losses, showing that KD acts as a form of partial variance reduction, which can reduce the stochastic gradient noise, but may not eliminate it completely, depending on the properties of the "teacher" model. Our analysis puts further emphasis on the need for careful parametrization of KD, in particular w.r.t. the weighting of the distillation loss, and is validated empirically on both linear models and deep neural networks.

artificial intelligence, distillation, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Static and Sequential Malicious Attacks in the Context of Selective Forgetting

Neural Information Processing SystemsApr-30-2026, 05:22:46 GMT

With the growing demand for the right to be forgotten, there is an increasing need for machine learning models to forget sensitive data and its impact. To address this, the paradigm of selective forgetting (a.k.a machine unlearning) has been extensively studied, which aims to remove the impact of requested data from a well-trained model without retraining from scratch. Despite its significant success, limited attention has been given to the security vulnerabilities of the unlearning system concerning malicious data update requests. Motivated by this, in this paper, we explore the possibility and feasibility of malicious data update requests during the unlearning process. Specifically, we first propose a new class of malicious selective forgetting attacks, which involves a static scenario where all the malicious data update requests are provided by the adversary at once. Additionally, considering the sequential setting where the data update requests arrive sequentially, we also design a novel framework for sequential forgetting attacks, which is formulated as a stochastic optimal control problem. We also propose novel optimization algorithms that can find the effective malicious data update requests. We perform theoretical analyses for the proposed selective forgetting attacks, and extensive experimental results validate the effectiveness of our proposed selective forgetting attacks. The source code is available in the supplementary material.

artificial intelligence, machine learning, update request, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Genre: Research Report (0.46)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ecc28b4ce9b39f5f23c3efb03e25b7bf-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 05:09:45 GMT

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre:

Workflow (0.46)
Research Report > New Finding (0.45)

Industry: Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Security & Privacy (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ec52572b9e16b91edff5dc70e2642240-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 05:08:32 GMT

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Rethinking Bias Mitigation: Fair Architectures Make for Fair Face Recognition

Neural Information Processing SystemsApr-30-2026, 04:42:20 GMT

artificial intelligence, machine learning, optimization problem, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.46)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Resetting the Optimizer in Deep RL: An Empirical Study

Neural Information Processing SystemsApr-30-2026, 02:41:49 GMT

We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of solving a sequence of optimization problems where the loss function changes per iteration. The common approach to solving this sequence of problems is to employ modern variants of the stochastic gradient descent algorithm such as Adam. These optimizers maintain their own internal parameters such as estimates of the first-order and the secondorder moments of the gradient, and update them over time. Therefore, information obtained in previous iterations is used to solve the optimization problem in the current iteration. We demonstrate that this can contaminate the moment estimates because the optimization landscape can change arbitrarily from one iteration to the next one. To hedge against this negative effect, a simple idea is to reset the internal parameters of the optimizer when starting a new iteration. We empirically investigate this resetting idea by employing various optimizers in conjunction with the Rainbow algorithm. We demonstrate that this simple modification significantly improves the performance of deep RL on the Atari benchmark.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Bias in Evaluation Processes: An Optimization-Based Model

Neural Information Processing SystemsApr-30-2026, 02:40:51 GMT

Biases with respect to socially-salient attributes of individuals have been well documented in evaluation processes used in settings such as admissions and hiring. We view such an evaluation process as a transformation of a distribution of the true utility of an individual for a task to an observed distribution and model it as a solution to a loss minimization problem subject to an information constraint. Our model has two parameters that have been identified as factors leading to biases: the resource-information trade-off parameter in the information constraint and the risk-averseness parameter in the loss function. We characterize the distributions that arise from our model and study the effect of the parameters on the observed distribution. The outputs of our model enrich the class of distributions that can be used to capture variation across groups in the observed evaluations. We empirically validate our model by fitting real-world datasets and use it to study the effect of interventions in a downstream selection task. These results contribute to an understanding of the emergence of bias in evaluation processes and provide tools to guide the deployment of interventions to mitigate biases.

artificial intelligence, information management, machine learning, (19 more...)

Neural Information Processing Systems

Country: