AITopics

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsOct-11-2025, 00:28:38 GMT

Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization

If the dataset is linearly separable and the derivative of the activation function is bounded away from zero, we show that the average empirical risk decreases, implying that the first phase must stop in finite steps.

lemma, stable phase, two-layer network, (14 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.65)

Wu, Jingfeng, Marion, Pierre, Bartlett, Peter

Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression

arXiv.org Machine LearningJun-4-2025

We study gradient descent (GD) with a constant stepsize for $\ell_2$-regularized logistic regression with linearly separable data. Classical theory suggests small stepsizes to ensure monotonic reduction of the optimization objective, achieving exponential convergence in $\widetilde{\mathcal{O}}(κ)$ steps with $κ$ being the condition number. Surprisingly, we show that this can be accelerated to $\widetilde{\mathcal{O}}(\sqrtκ)$ by simply using a large stepsize -- for which the objective evolves nonmonotonically. The acceleration brought by large stepsizes extends to minimizing the population risk for separable distributions, improving on the best-known upper bounds on the number of steps to reach a near-optimum. Finally, we characterize the largest stepsize for the local convergence of GD, which also determines the global convergence in special scenarios. Our results extend the analysis of Wu et al. (2024) from convex settings with minimizers at infinity to strongly convex cases with finite minimizers.

artificial intelligence, machine learning, stepsize, (17 more...)

2506.02336

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > New Finding (0.92)
Research Report > Experimental Study (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

arXiv.org Machine LearningMay-16-2025

Neural Thermodynamic Laws for Large Language Model Training

Liu, Ziming, Liu, Yizhou, Gore, Jeff, Tegmark, Max

Beyond neural scaling laws, little is known about the laws underlying large language models (LLMs). We introduce Neural Thermodynamic Laws (NTL) -- a new framework that offers fresh insights into LLM training dynamics. On the theoretical side, we demonstrate that key thermodynamic quantities (e.g., temperature, entropy, heat capacity, thermal conduction) and classical thermodynamic principles (e.g., the three laws of thermodynamics and the equipartition theorem) naturally emerge under river-valley loss landscape assumptions. On the practical side, this scientific perspective yields intuitive guidelines for designing learning rate schedules.

large language model, machine learning, natural language, (18 more...)

2505.10559

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Alepuz, Adrià Mompó, Papageorgiou, Dimitrios, Tolu, Silvia

Learning-based Delay Compensation for Enhanced Control of Assistive Soft Robots

arXiv.org Artificial IntelligenceApr-18-2025

Soft robots are increasingly used in healthcare, especially for assistive care, due to their inherent safety and adaptability. Controlling soft robots is challenging due to their nonlinear dynamics and the presence of time delays, especially in applications like a soft robotic arm for patient care. This paper presents a learning-based approach to approximate the nonlinear state predictor (Smith Predictor), aiming to improve tracking performance in a two-module soft robot arm with a short inherent input delay. The method uses Kernel Recursive Least Squares Tracker (KRLST) for online learning of the system dynamics and a Legendre Delay Network (LDN) to compress past input history for efficient delay compensation. Experimental results demonstrate significant improvement in tracking performance compared to a baseline model-based non-linear controller. Statistical analysis confirms the significance of the improvements. The method is computationally efficient and adaptable online, making it suitable for real-world scenarios and highlighting its potential for enabling safer and more accurate control of soft robots in assistive care applications.

artificial intelligence, machine learning, smith predictor, (17 more...)

arXiv.org Artificial Intelligence

2504.12428

Country: Europe > Denmark (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Education > Educational Setting > Online (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Machine LearningJun-26-2024

Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization

Cai, Yuhang, Wu, Jingfeng, Mei, Song, Lindsey, Michael, Bartlett, Peter L.

The typical training of neural networks using large stepsize gradient descent (GD) under the logistic loss often involves two distinct phases, where the empirical risk oscillates in the first phase but decreases monotonically in the second phase. We investigate this phenomenon in two-layer networks that satisfy a near-homogeneity condition. We show that the second phase begins once the empirical risk falls below a certain threshold, dependent on the stepsize. Additionally, we show that the normalized margin grows nearly monotonically in the second phase, demonstrating an implicit bias of GD in training non-homogeneous predictors. If the dataset is linearly separable and the derivative of the activation function is bounded away from zero, we show that the average empirical risk decreases, implying that the first phase must stop in finite steps. Finally, we demonstrate that by choosing a suitably large stepsize, GD that undergoes this phase transition is more efficient than GD that monotonically decreases the risk. Our analysis applies to networks of any width, beyond the well-known neural tangent kernel and mean-field regimes.

lemma, stable phase, two-layer network, (14 more...)

2406.08654

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

arXiv.org Machine LearningFeb-24-2024

Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency

Wu, Jingfeng, Bartlett, Peter L., Telgarsky, Matus, Yu, Bin

We consider gradient descent (GD) with a constant stepsize applied to logistic regression with linearly separable data, where the constant stepsize $\eta$ is so large that the loss initially oscillates. We show that GD exits this initial oscillatory phase rapidly -- in $\mathcal{O}(\eta)$ steps -- and subsequently achieves an $\tilde{\mathcal{O}}(1 / (\eta t) )$ convergence rate after $t$ additional steps. Our results imply that, given a budget of $T$ steps, GD can achieve an accelerated loss of $\tilde{\mathcal{O}}(1/T^2)$ with an aggressive stepsize $\eta:= \Theta( T)$, without any use of momentum or variable stepsize schedulers. Our proof technique is versatile and also handles general classification loss functions (where exponential tails are needed for the $\tilde{\mathcal{O}}(1/T^2)$ acceleration), nonlinear predictors in the neural tangent kernel regime, and online stochastic gradient descent (SGD) with a large stepsize, under suitable separability conditions.

eos phase, probability, stepsize, (14 more...)

2402.15926

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Neural Information Processing SystemsDec-31-2005

The Cerebellum Chip: an Analog VLSI Implementation of a Cerebellar Model of Classical Conditioning

Hofstoetter, Constanze, Gil, Manuel, Eng, Kynan, Indiveri, Giacomo, Mintz, Matti, Kramer, Jörg, Verschure, Paul F.

We present a biophysically constrained cerebellar model of classical conditioning, implemented using a neuromorphic analog VLSI (aVLSI) chip. Like its biological counterpart, our cerebellar model is able to control adaptive behavior by predicting the precise timing of events. Here we describe the functionality of the chip and present its learning performance, as evaluated in simulated conditioning experiments at the circuit level and in behavioral experiments using a mobile robot. We show that this aVLSI model supports the acquisition and extinction of adaptively timed conditioned responses under real-world conditions with ultra-low power consumption.

effective cr, experiment, robot, (16 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsDec-31-2005

The Cerebellum Chip: an Analog VLSI Implementation of a Cerebellar Model of Classical Conditioning

Hofstoetter, Constanze, Gil, Manuel, Eng, Kynan, Indiveri, Giacomo, Mintz, Matti, Kramer, Jörg, Verschure, Paul F.

We present a biophysically constrained cerebellar model of classical conditioning, implemented using a neuromorphic analog VLSI (aVLSI) chip. Like its biological counterpart, our cerebellar model is able to control adaptive behavior by predicting the precise timing of events. Here we describe the functionality of the chip and present its learning performance, as evaluated in simulated conditioning experiments at the circuit level and in behavioral experiments using a mobile robot. We show that this aVLSI model supports the acquisition and extinction of adaptively timed conditioned responses under real-world conditions with ultra-low power consumption.

effective cr, experiment, robot, (16 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsDec-31-2005

The Cerebellum Chip: an Analog VLSI Implementation of a Cerebellar Model of Classical Conditioning

Hofstoetter, Constanze, Gil, Manuel, Eng, Kynan, Indiveri, Giacomo, Mintz, Matti, Kramer, Jörg, Verschure, Paul F.

We present a biophysically constrained cerebellar model of classical conditioning, implemented using a neuromorphic analog VLSI (aVLSI) chip. Like its biological counterpart, our cerebellar model is able to control adaptive behavior by predicting the precise timing of events. Here we describe the functionality of the chip and present its learning performance, as evaluated in simulated conditioning experiments at the circuit level and in behavioral experiments using a mobile robot. We show that this aVLSI model supports the acquisition and extinction of adaptively timed conditioned responses under real-world conditions with ultra-low power consumption.

effective cr, experiment, robot, (16 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)