AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

Distillation Scaling Laws

Busbridge, Dan, Shidani, Amitis, Weers, Floris, Ramapuram, Jason, Littwin, Etai, Webb, Russ

arXiv.org Machine LearningFeb-12-2025

We provide a distillation scaling law that estimates distilled model performance based on a compute budget and its allocation between the student and teacher. Our findings reduce the risks associated with using distillation at scale; compute allocation for both the teacher and student models can now be done to maximize student performance. We provide compute optimal distillation recipes for when 1) a teacher exists, or 2) a teacher needs training. If many students are to be distilled, or a teacher already exists, distillation outperforms supervised pretraining until a compute level which grows predictably with student size. If one student is to be distilled and a teacher also needs training, supervised learning should be done instead. Additionally, we provide insights across our large scale study of distillation, which increase our understanding of distillation and inform experimental design.

distillation, large language model, machine learning, (20 more...)

arXiv.org Machine Learning

2502.08606

Country:

Europe > Austria > Vienna (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
(21 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Educational Technology > Educational Software (0.48)
Education > Assessment & Standards > Student Performance (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

WENDy for Nonlinear-in-Parameter ODEs

Rummel, Nic, Messenger, Daniel A., Becker, Stephen, Dukic, Vanja, Bortz, David M.

arXiv.org Machine LearningFeb-12-2025

The Weak-form Estimation of Non-linear Dynamics (WENDy) algorithm is extended to accommodate systems of ordinary differential equations that are nonlinear-in-parameters (NiP). The extension rests on derived analytic expressions for a likelihood function, its gradient and its Hessian matrix. WENDy makes use of these to approximate a maximum likelihood estimator based on optimization routines suited for non-convex optimization problems. The resulting parameter estimation algorithm has better accuracy, a substantially larger domain of convergence, and is often orders of magnitude faster than the conventional output error least squares method (based on forward solvers). The WENDy.jl algorithm is efficiently implemented in Julia. We demonstrate the algorithm's ability to accommodate the weak form optimization for both additive normal and multiplicative log-normal noise, and present results on a suite of benchmark systems of ordinary differential equations. In order to demonstrate the practical benefits of our approach, we present extensive comparisons between our method and output error methods in terms of accuracy, precision, bias, and coverage.

artificial intelligence, machine learning, variance 0, (18 more...)

arXiv.org Machine Learning

2502.08881

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
(3 more...)

Genre:

Research Report (0.64)
Overview (0.45)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

A First-order Generative Bilevel Optimization Framework for Diffusion Models

Xiao, Quan, Yuan, Hui, Saif, A F M, Liu, Gaowen, Kompella, Ramana, Wang, Mengdi, Chen, Tianyi

arXiv.org Machine LearningFeb-12-2025

Diffusion models, which iteratively denoise data samples to synthesize high-quality outputs, have achieved empirical success across domains. However, optimizing these models for downstream tasks often involves nested bilevel structures, such as tuning hyperparameters for fine-tuning tasks or noise schedules in training dynamics, where traditional bilevel methods fail due to the infinite-dimensional probability space and prohibitive sampling costs. We formalize this challenge as a generative bilevel optimization problem and address two key scenarios: (1) fine-tuning pre-trained models via an inference-only lower-level solver paired with a sample-efficient gradient estimator for the upper level, and (2) training diffusion models from scratch with noise schedule optimization by reparameterizing the lower-level problem and designing a computationally tractable gradient estimator. Our first-order bilevel framework overcomes the incompatibility of conventional bilevel methods with diffusion processes, offering theoretical grounding and computational practicality. Experiments demonstrate that our method outperforms existing fine-tuning and hyperparameter search baselines.

artificial intelligence, first-order generative bilevel optimization framework, machine learning, (3 more...)

arXiv.org Machine Learning

2502.08808

Country: North America > United States (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.80)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Add feedback

Safety at Scale: A Comprehensive Survey of Large Model Safety

Ma, Xingjun, Gao, Yifeng, Wang, Yixu, Wang, Ruofan, Wang, Xin, Sun, Ye, Ding, Yifan, Xu, Hengyuan, Chen, Yunhao, Zhao, Yunhan, Huang, Hanxun, Li, Yige, Zhang, Jiaming, Zheng, Xiang, Bai, Yang, Wu, Zuxuan, Qiu, Xipeng, Zhang, Jingfeng, Li, Yiming, Sun, Jun, Wang, Cong, Gu, Jindong, Wu, Baoyuan, Chen, Siheng, Zhang, Tianwei, Liu, Yang, Gong, Mingming, Liu, Tongliang, Pan, Shirui, Xie, Cihang, Pang, Tianyu, Dong, Yinpeng, Jia, Ruoxi, Zhang, Yang, Ma, Shiqing, Zhang, Xiangyu, Gong, Neil, Xiao, Chaowei, Erfani, Sarah, Li, Bo, Sugiyama, Masashi, Tao, Dacheng, Bailey, James, Jiang, Yu-Gang

arXiv.org Artificial IntelligenceFeb-12-2025

The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI). These models are now foundational to a wide range of applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, and scientific discovery. However, their widespread deployment also exposes them to significant safety risks, raising concerns about robustness, reliability, and ethical implications. This survey provides a systematic review of current safety research on large models, covering Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-based Agents. Our contributions are summarized as follows: (1) We present a comprehensive taxonomy of safety threats to these models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. (2) We review defense strategies proposed for each type of attacks if available and summarize the commonly used datasets and benchmarks for safety research. (3) Building on this, we identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices. More importantly, we highlight the necessity of collective efforts from the research community and international collaboration. Our work can serve as a useful reference for researchers and practitioners, fostering the ongoing development of comprehensive defense systems and platforms to safeguard AI models.

adversarial example, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2502.05206

Country:

Asia > China > Hong Kong (0.04)
Asia > Middle East > Jordan (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
(11 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Review for NeurIPS paper: An efficient nonconvex reformulation of stagewise convex optimization problems

Neural Information Processing SystemsFeb-11-2025, 22:19:48 GMT

Additional Feedback: Update after rebuttal After reading the rebuttal, I am happy with the answers regarding PDHG, and explanations about the proof for backtracking/momentum variants of the method. I find the approach promising. However, I think the work needs improvements in terms of presentation, which, I suggest the authors to consider when revising the paper. Especially the introduction of the idea can be made more friendly for the readers, with more explanations. I suggest the authors to also consider the minor questions in my review that are not mentioned in the rebuttal.

efficient nonconvex reformulation, rebuttal, stagewise convex optimization problem, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.40)

Add feedback

Review for NeurIPS paper: An efficient nonconvex reformulation of stagewise convex optimization problems

Neural Information Processing SystemsFeb-11-2025, 22:19:41 GMT

The paper considers structured convex optimization where constraints are given in a stage-wise manner. The paper studies a non-convex reformulation for this problem and proposes new algorithms to ensure convergence to global minimizers for both non-degenerate and degenerate cases. The reformulation is proven effective in theory and experiments. The author feedback phase has clarified several aspects, resulting in a consensus on weak acceptance. We hope the detailed feedback with improvement suggestions from the 4 reviews will be implemented for the camera ready version, in particular about the clarity and readability of the paper.

artificial intelligence, optimization problem, reformulation, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.40)

Add feedback

Satisfying Real-world Goals with Dataset Constraints

Neural Information Processing SystemsFeb-11-2025, 20:32:43 GMT

The goal of minimizing misclassification error on a training set is often just one of several real-world goals that might be defined on different datasets. For example, one may require a classifier to also make positive predictions at some specified rate for some subpopulation (fairness), or to achieve a specified empirical recall. Other real-world goals include reducing churn with respect to a previously deployed model, or stabilizing online training. In this paper we propose handling multiple goals on multiple datasets by training with dataset constraints, using the ramp penalty to accurately quantify costs, and present an efficient algorithm to approximately optimize the resulting non-convex constrained optimization problem. Experiments on both benchmark and real-world industry datasets demonstrate the effectiveness of our approach.

dataset constraint, real-world goal

Neural Information Processing Systems

Industry: Education > Educational Setting > Online (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

Maximizing Influence in an Ising Network: A Mean-Field Optimal Solution

Neural Information Processing SystemsFeb-11-2025, 18:54:03 GMT

Influence maximization in social networks has typically been studied in the context of contagion models and irreversible processes. In this paper, we consider an alternate model that treats individual opinions as spins in an Ising system at dynamic equilibrium. We formalize the \textit{Ising influence maximization} problem, which has a natural physical interpretation as maximizing the magnetization given a budget of external magnetic field. Under the mean-field (MF) approximation, we present a gradient ascent algorithm that uses the susceptibility to efficiently calculate local maxima of the magnetization, and we develop a number of sufficient conditions for when the MF magnetization is concave and our algorithm converges to a global optimum. We apply our algorithm on random and real-world networks, demonstrating, remarkably, that the MF optimal external fields (i.e., the external fields which maximize the MF magnetization) exhibit a phase transition from focusing on high-degree individuals at high temperatures to focusing on low-degree individuals at low temperatures.

magnetization, maximizing influence, mean-field optimal solution, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.43)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.40)

Add feedback

Reviews: Dimension-Free Iteration Complexity of Finite Sum Optimization Problems

Neural Information Processing SystemsFeb-11-2025, 18:53:13 GMT

Technical quality: The proofs derived in the paper are sound and well presented. One of the most interesting contributions is the lower bound for stochastic methods (including Stochastic Gradient Descent) which uses Yao's minimax principle, a neat and simple trick. The paper also provides some new insights, e.g. Novelty/originality: Although the lower-bounds derived in this paper are of significant interest, I nevertheless have some concern with the current way the paper is written, especially concerning the differences to [5] that are not clearly stated in the paper. Although the authors seem to imply that they are the first one to derive dimension-free bounds, the work of [5] already derived lower bounds that hold independently of the dimension.

dimension-free iteration complexity, finite sum optimization problem, review

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.40)

Add feedback

Learning Supervised PageRank with Gradient-Based and Gradient-Free Optimization Methods

Neural Information Processing SystemsFeb-11-2025, 18:44:37 GMT

In this paper, we consider a non-convex loss-minimization problem of learning Supervised PageRank models, which can account for features of nodes and edges. We propose gradient-based and random gradient-free methods to solve this problem. Our algorithms are based on the concept of an inexact oracle and unlike the state-of-the-art gradient-based method we manage to provide theoretically the convergence rate guarantees for both of them. Finally, we compare the performance of the proposed optimization methods with the state of the art applied to a ranking task.

gradient-based and gradient-free optimization method, learning supervised pagerank

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.92)
Information Technology > Information Management > Search (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback