AITopics

On the Convergence of Step Decay Step-Size for Stochastic Optimization

Neural Information Processing SystemsMay-29-2025, 05:28:42 GMT

The convergence of stochastic gradient descent is highly dependent on the step-size, especially on non-convex problems such as neural network training. Step decay step-size schedules (constant and then cut) are widely used in practice because of their excellent convergence and generalization qualities, but their theoretical properties are not yet well understood. We provide convergence results for step decay in the non-convex regime, ensuring that the gradient norm vanishes at an O(ln T/ T) rate. We also provide near-optimal (and sometimes provably tight) convergence guarantees for general, possibly non-smooth, convex and strongly convex problems. The practical efficiency of the step decay step-size is demonstrated in several large-scale deep neural network training tasks.

artificial intelligence, machine learning, step decay, (18 more...)

Neural Information Processing Systems

Country:

Europe > Sweden (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

On the Convergence of Step Decay Step-Size for Stochastic Optimization

Neural Information Processing SystemsMay-29-2025, 05:28:38 GMT

The convergence of stochastic gradient descent is highly dependent on the step-size, especially on non-convex problems such as neural network training. Step decay step-size schedules (constant and then cut) are widely used in practice because of their excellent convergence and generalization qualities, but their theoretical properties are not yet well understood. We provide convergence results for step decay in the non-convex regime, ensuring that the gradient norm vanishes at an O(ln T/ T) rate. We also provide near-optimal (and sometimes provably tight) convergence guarantees for general, possibly non-smooth, convex and strongly convex problems. The practical efficiency of the step decay step-size is demonstrated in several large-scale deep neural network training tasks.

artificial intelligence, convergence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Sweden (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Add feedback

481d462e46c2ab976294271a175b8929-Paper.pdf

Neural Information Processing SystemsMay-29-2025, 05:28:24 GMT

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

481d462e46c2ab976294271a175b8929-AuthorFeedback.pdf

Neural Information Processing SystemsMay-29-2025, 05:28:13 GMT

artificial intelligence, machine learning, optimization, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.32)
Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach

Neural Information Processing SystemsMay-29-2025, 05:27:50 GMT

Web-scale visual entity recognition, the task of associating images with their corresponding entities within vast knowledge bases like Wikipedia, presents significant challenges due to the lack of clean, large-scale training data. In this paper, we propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation. Instead of relying on the multimodal LLM to directly annotate data, which we found to be suboptimal, we prompt it to reason about potential candidate entity labels by accessing additional contextually relevant information (such as Wikipedia), resulting in more accurate annotations. We further use the multimodal LLM to enrich the dataset by generating question-answer pairs and a grounded finegrained textual description (referred to as "rationale") that explains the connection between images and their assigned entities. Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks (e.g.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.92)
North America > United States > Texas > Loving County (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Transportation > Passenger (1.00)
Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
(16 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Grounded Reinforcement Learning: Learning to Win the Game under Human Commands Supplementary Materials

Neural Information Processing SystemsMay-29-2025, 05:27:43 GMT

In this section, we describe the details of MiniRTS Environment and human dataset. The data do not contain any personally identifiable information or offensive content. Figure 1: MiniRTS [2] implements the rockpaper-scissors Figure 2: Building units can produce different attack graph, each army type army units using resources. "workshop" can produce has some units it is effective against and vulnerable "archer", "dragon" and "catapult" while other to. For example, "swordman" restrains buildings can build one unit type. Only "peasant" "spearman" but is retrained by "cavarly". Game Units There are 3 kinds of units in MiniRTS, including resource units, building units, and army units. Resource Units: Resource units are stationary and neutral. Resource units cannot be constructed by anyone and are created at the beginning of a game. One mine action could gather resources from the resource units, and the mined resources are necessary to build new building units or army units.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Industry:

Leisure & Entertainment > Games (0.95)
Government > Military > Army (0.78)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

Grounded Reinforcement Learning: Learning to Win the Game under Human Commands

Neural Information Processing SystemsMay-29-2025, 05:27:40 GMT

We consider the problem of building a reinforcement learning (RL) agent that can both accomplish non-trivial tasks, like winning a real-time strategy game, and strictly follow high-level language commands from humans, like "attack", even if a command is sub-optimal. We call this novel yet important problem, Grounded Reinforcement Learning (GRL). Compared with other language grounding tasks, GRL is particularly non-trivial and cannot be simply solved by pure RL or behavior cloning (BC). From the RL perspective, it is extremely challenging to derive a precise reward function for human preferences since the commands are abstract and the valid behaviors are highly complicated and multi-modal. From the BC perspective, it is impossible to obtain perfect demonstrations since human strategies in complex games are typically sub-optimal. We tackle GRL via a simple, tractable, and practical constrained RL objective and develop an iterative RL algorithm, REinforced demonstration Distillation (RED), to obtain a strong GRL policy. We evaluate the policies derived by RED, BC and pure RL methods on a simplified real-time strategy game, MiniRTS. Experiment results and human studies show that the RED policy is able to consistently follow human commands and, at the same time, achieve a higher win rate than the baselines. We release our code and present more examples at https://sites.google.com/view/grounded-rl.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Attention Bottlenecks for Multimodal Fusion - Supplementary Materials Arsha Nagrani Shan Yang Anurag Arnab Aren Jansen Cordelia Schmid Chen Sun

Neural Information Processing SystemsMay-29-2025, 05:27:08 GMT

Here we provide additional ablation results on mini-Audioset (Sec. We then provide results on two additional datasets, Moments in Time and Kinetics in Sec. C and perform some preliminary transfer learning experiments in Sec. E. Finally we provide details on the AS-500K split. In this section we expand on the ablations provided in Sec.

artificial intelligence, dataset, machine learning, (13 more...)

Neural Information Processing Systems

Industry: Law Enforcement & Public Safety (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Attention Bottlenecks for Multimodal Fusion

Neural Information Processing SystemsMay-29-2025, 05:27:05 GMT

Humans perceive the world by concurrently processing and fusing highdimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality ('late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses'fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense relevant information in each modality and share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Introducing Routing Uncertainty in Capsule Networks Fabio De Sousa Ribeiro

Neural Information Processing SystemsMay-29-2025, 05:26:54 GMT

Rather than performing inefficient local iterative routing between adjacent capsule layers, we propose an alternative global view based on representing the inherent uncertainty in part-object assignment. In our formulation, the local routing iterations are replaced with variational inference of part-object connections in a probabilistic capsule network, leading to a significant speedup without sacrificing performance. In this way, global context is also considered when routing capsules by introducing global latent variables that have direct influence on the objective function, and are updated discriminatively in accordance with the minimum description length (MDL) principle. We focus on enhancing capsule network properties, and perform a thorough evaluation on pose-aware tasks, observing improvements in performance over previous approaches whilst being more computationally efficient.

Add feedback

Filters

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

On the Convergence of Step Decay Step-Size for Stochastic Optimization

On the Convergence of Step Decay Step-Size for Stochastic Optimization

481d462e46c2ab976294271a175b8929-Paper.pdf

481d462e46c2ab976294271a175b8929-AuthorFeedback.pdf

Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach

Grounded Reinforcement Learning: Learning to Win the Game under Human Commands Supplementary Materials

Grounded Reinforcement Learning: Learning to Win the Game under Human Commands

Attention Bottlenecks for Multimodal Fusion - Supplementary Materials Arsha Nagrani Shan Yang Anurag Arnab Aren Jansen Cordelia Schmid Chen Sun

Attention Bottlenecks for Multimodal Fusion

Introducing Routing Uncertainty in Capsule Networks Fabio De Sousa Ribeiro