AITopics

Country:

North America > United States > California > Los Angeles County > Pasadena (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > Canada (0.04)
(2 more...)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.63)

Neural Information Processing SystemsFeb-12-2026, 05:52:06 GMT

56c51a39a7c77d8084838cc920585bd0-AuthorFeedback.pdf

cgd, diagonal block, matrix inverse, (11 more...)

Technology: Information Technology > Artificial Intelligence (0.52)

Florian Schaefer, Anima Anandkumar

Competitive Gradient Descent

Neural Information Processing SystemsOct-2-2025, 18:34:14 GMT

Neural Information Processing Systems http://nips.cc/

approximation, artificial intelligence, machine learning, (17 more...)

Country: North America > United States > California (0.14)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.43)

Neural Information Processing SystemsOct-2-2025, 18:33:59 GMT

We thank all three reviewers for their thorough reviews and constructive feedback

We thank all three reviewers for their thorough reviews and constructive feedback. Otherwise, including additional second order information can make the results worse. "...CGD still requires that the step-size is bounded by one over the max diagonal entry of the Hessian...": Concern 1: Why not use full second order? See also our answer to Reviewer #7. Concern 3: Is CGD scalable?

artificial intelligence, cgd, thorough review and constructive feedback, (11 more...)

Technology: Information Technology > Artificial Intelligence (0.52)

Kapusuzoglu, Berkcan, Chakraborty, Supriyo, Lee, Chia-Hsuan, Sahu, Sambit

Critique-Guided Distillation for Efficient and Robust Language Model Reasoning

arXiv.org Artificial IntelligenceSep-30-2025

Supervised fine-tuning (SFT) with expert demonstrations often suffers from the imitation problem, where models reproduce correct responses without internalizing the underlying reasoning. We propose Critique-Guided Distillation (CGD), a multi-stage training framework that augments SFT with teacher-generated explanatory critiques and refined responses. Instead of directly imitating teacher outputs, a student learns to map the triplet of prompt, its own initial response, and teacher critique into the refined teacher response, thereby capturing both what to output and why. Our analyses show that CGD consistently reduces refinement uncertainty, improves alignment between critiques and responses, and enhances sample efficiency. On reasoning benchmarks, CGD achieves substantial gains across LLaMA and Qwen families, including +15.0% on AMC23 and +12.2% on MATH-500, while avoiding the format drift issues observed in prior critique-based fine-tuning. Importantly, on LLaMA-3.1-8B CGD approaches or exceeds the performance of SimpleRL-Zero, which is a DeepSeek-R1 replication, while requiring 60x less compute. Beyond reasoning, CGD maintains or improves general instruction-following and factual accuracy, matching baseline performance on IFEval, MUSR, TruthfulQA, and BBH. In contrast, prior critique-based methods degrade these capabilities (e.g., -21% on IFEval). Taken together, these results establish CGD} as a robust and generalizable alternative to both conventional SFT and RL-based methods, offering a more efficient path toward advancing the reasoning and safety of large language models.

large language model, machine learning, natural language, (19 more...)

2505.11628

Country:

Asia (0.45)
North America > Mexico (0.28)
Europe > Austria (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

arXiv.org Artificial IntelligenceJul-28-2025

CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation

Xu, Binyan, Yang, Fan, Dai, Xilin, Tang, Di, Zhang, Kehuan

Deep Neural Networks (DNNs) are susceptible to backdoor attacks, where adversaries poison training data to implant backdoor into the victim model. Current backdoor defenses on poisoned data often suffer from high computational costs or low effectiveness against advanced attacks like clean-label and clean-image backdoors. To address them, we introduce CLIP-Guided backdoor Defense (CGD), an efficient and effective method that mitigates various backdoor attacks. CGD utilizes a publicly accessible CLIP model to identify inputs that are likely to be clean or poisoned. It then retrains the model with these inputs, using CLIP's logits as a guidance to effectively neutralize the backdoor. Experiments on 4 datasets and 11 attack types demonstrate that CGD reduces attack success rates (ASRs) to below 1% while maintaining clean accuracy (CA) with a maximum drop of only 0.3%, outperforming existing defenses. Additionally, we show that clean-data-based defenses can be adapted to poisoned data using CGD. Also, CGD exhibits strong robustness, maintaining low ASRs even when employing a weaker CLIP model or when CLIP itself is compromised by a backdoor. These findings underscore CGD's exceptional efficiency, effectiveness, and applicability for real-world backdoor defense scenarios. Code: https://github.com/binyxu/CGD.

artificial intelligence, backdoor, machine learning, (18 more...)

2507.05113

Country:

Europe (0.68)
Asia > China (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Guskov, Dmitry, Vanchurin, Vitaly

Covariant Gradient Descent

arXiv.org Artificial IntelligenceApr-15-2025

We present a manifestly covariant formulation of the gradient descent method, ensuring consistency across arbitrary coordinate systems and general curved trainable spaces. The optimization dynamics is defined using a covariant force vector and a covariant metric tensor, both computed from the first and second statistical moments of the gradients. These moments are estimated through time-averaging with an exponential weight function, which preserves linear computational complexity. We show that commonly used optimization methods such as RMSProp, Adam and AdaBelief correspond to special limits of the covariant gradient descent (CGD) and demonstrate how these methods can be further generalized and improved.

artificial intelligence, machine learning, metric tensor, (16 more...)

2504.05279

Country: North America > United States > Minnesota (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.94)

Neural Information Processing SystemsJan-23-2025, 20:33:56 GMT

Reviews: Competitive Gradient Descent

This paper deals with the computation of Nash Equilibria in competitive two-player games where the x player is minimizing a function f(x,y) and the y player is minimizing a function g(x,y) . Such problems arise in a wide variety of domains, notably in training GANs, and there has been much recent interest in developing algorithms for solving such problems. Gradient Descent Ascent (GDA) is a natural candidate algorithm for finding Nash Equilibrium, but it will provably oscillate or diverge even in simple settings. As such, many recent works have modified GDA or proposed different algorithms or schemes to guarantee convergence. This paper proposes a new algorithm called Competitive Gradient Descent (CGD), which updates each player's iterates by adding the Nash Equilibrium of a regularized bilinear approximation of the game at the current iterates. CGD requires Hessian-vector products, making it a second-order algorithm.

algorithm, competitive gradient descent, hessian-vector product, (8 more...)

Technology:

Information Technology > Game Theory (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.84)

Dutta, Sanchayan, Sra, Suvrit

Memory-augmented Transformers can implement Linear First-Order Optimization Methods

arXiv.org Artificial IntelligenceDec-8-2024

We show that memory-augmented Transformers (Memformers) can implement linear first-order optimization methods such as conjugate gradient descent, momentum methods, and more generally, methods that linearly combine past gradients. Building on prior work that demonstrates how Transformers can simulate preconditioned gradient descent, we provide theoretical and empirical evidence that Memformers can learn more advanced optimization algorithms. Specifically, we analyze how memory registers in Memformers store suitable intermediate attention values allowing them to implement algorithms such as conjugate gradient. Our results show that Memformers can efficiently learn these methods by training on random linear regression tasks, even learning methods that outperform conjugate gradient. This work extends our knowledge about the algorithmic capabilities of Transformers, showing how they can learn complex optimization methods.

artificial intelligence, machine learning, transformer, (16 more...)

2410.07263

Country:

North America > United States > California > Yolo County > Davis (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Maignan, Luidnel, Spicher, Antoine

Causal Graph Dynamics and Kan Extensions

arXiv.org Artificial IntelligenceMar-20-2024

On the one side, the formalism of Global Transformations comes with the claim of capturing any transformation of space that is local, synchronous and deterministic.The claim has been proven for different classes of models such as mesh refinements from computer graphics, Lindenmayer systems from morphogenesis modeling and cellular automata from biological, physical and parallel computation modeling.The Global Transformation formalism achieves this by using category theory for its genericity, and more precisely the notion of Kan extension to determine the global behaviors based on the local ones.On the other side, Causal Graph Dynamics describe the transformation of port graphs in a synchronous and deterministic way and has not yet being tackled.In this paper, we show the precise sense in which the claim of Global Transformations holds for them as well.This is done by showing different ways in which they can be expressed as Kan extensions, each of them highlighting different features of Causal Graph Dynamics.Along the way, this work uncovers the interesting class of Monotonic Causal Graph Dynamics and their universality among General Causal Graph Dynamics.

causal graph dynamic, graph, vertex, (15 more...)

2403.13393

Country:

Europe > Italy > Abruzzo > L'Aquila Province > L'Aquila (0.04)
Europe > France (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)