AITopics | rms

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry:

Energy (0.67)
Information Technology (0.45)
Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Neural Information Processing SystemsFeb-15-2026, 20:08:34 GMT

The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks

Deep learning succeeds by doing hierarchical feature learning, yet tuning hyper-parameters (HP) such as initialization scales, learning rates etc., only give indirect

artificial intelligence, initialization, machine learning, (16 more...)

Country:

North America > United States (0.14)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-12-2026, 04:38:21 GMT

LearningRewardMachinesforPartially ObservableReinforcementLearning

RL agents learn policies from experience.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Neural Information Processing SystemsFeb-8-2026, 23:44:28 GMT

1cf760a547822e2b8276881ad45f0fe9-Paper-Conference.pdf

Such a question is very relevant: boostinghas quickly evolvedasatechnique requiring first-order information about the loss optimized [6, Section 10.3], [41, Section 7.2.2]

artificial intelligence, justification, machine learning, (18 more...)

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > Romania > Sud-Est Development Region > Constanța County > Constanța (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsDec-25-2025, 09:31:48 GMT

Learning Reward Machines for Partially Observable Reinforcement Learning

Reward Machines (RMs), originally proposed for specifying problems in Reinforcement Learning (RL), provide a structured, automata-based representation of a reward function that allows an agent to decompose problems into subproblems that can be efficiently learned using off-policy learning. Here we show that RMs can be learned from experience, instead of being specified by the user, and that the resulting problem decomposition can be used to effectively solve partially observable RL problems. We pose the task of learning RMs as a discrete optimization problem where the objective is to find an RM that decomposes the problem into a set of subproblems such that the combination of their optimal memoryless policies is an optimal policy for the original problem. We show the effectiveness of this approach on three partially observable domains, where it significantly outperforms A3C, PPO, and ACER, and discuss its advantages, limitations, and broader potential.

learning reward machine, observable reinforcement learning, reinforcement learning, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.98)

Davis, Damek, Drusvyatskiy, Dmitriy

When do spectral gradient updates help in deep learning?

arXiv.org Machine LearningDec-5-2025

Spectral gradient methods, such as the recently popularized Muon optimizer, are a promising alternative to standard Euclidean gradient descent for training deep neural networks and transformers, but it is still unclear in which regimes they are expected to perform better. We propose a simple layerwise condition that predicts when a spectral update yields a larger decrease in the loss than a Euclidean gradient step. This condition compares, for each parameter block, the squared nuclear-to-Frobenius ratio of the gradient to the stable rank of the incoming activations. To understand when this condition may be satisfied, we first prove that post-activation matrices have low stable rank at Gaussian initialization in random feature regression, feedforward networks, and transformer blocks. In spiked random feature models we then show that, after a short burn-in, the Euclidean gradient's nuclear-to-Frobenius ratio grows with the data dimension while the stable rank of the activations remains bounded, so the predicted advantage of spectral updates scales with dimension. We validate these predictions in synthetic regression experiments and in NanoGPT-scale language model training, where we find that intermediate activations have low-stable-rank throughout training and the corresponding gradients maintain large nuclear-to-Frobenius ratios. Together, these results identify conditions for spectral gradient methods, such as Muon, to be effective in training deep networks and transformers.

activation, matrix, stable rank, (16 more...)

arXiv.org Machine Learning

2512.04299

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Pennsylvania (0.04)
(6 more...)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Masoni, Matteo, Palermo, Vincenzo, Gabiccini, Marco, Gulisano, Martino, Previati, Giorgio, Gobbi, Massimiliano, Comolli, Francesco, Mastinu, Gianpiero, Guiggiani, Massimo

On Disturbance-Aware Minimum-Time Trajectory Planning: Evidence from Tests on a Dynamic Driving Simulator

arXiv.org Artificial IntelligenceDec-5-2025

This work investigates how disturbance-aware, robustness-embedded reference trajectories translate into driving performance when executed by professional drivers in a dynamic simulator. Three planned reference trajectories are compared against a free-driving baseline (NOREF) to assess trade-offs between lap time (LT) and steering effort (SE): NOM, the nominal time-optimal trajectory; TLC, a track-limit-robust trajectory obtained by tightening margins to the track edges; and FLC, a friction-limit-robust trajectory obtained by tightening against axle and tire saturation. All trajectories share the same minimum lap-time objective with a small steering-smoothness regularizer and are evaluated by two professional drivers using a high-performance car on a virtual track. The trajectories derive from a disturbance-aware minimum-lap-time framework recently proposed by the authors, where worst-case disturbance growth is propagated over a finite horizon and used to tighten tire-friction and track-limit constraints, preserving performance while providing probabilistic safety margins. LT and SE are used as performance indicators, while RMS lateral deviation, speed error, and drift angle characterize driving style. Results show a Pareto-like LT-SE trade-off: NOM yields the shortest LT but highest SE; TLC minimizes SE at the cost of longer LT; FLC lies near the efficient frontier, substantially reducing SE relative to NOM with only a small LT increase. Removing trajectory guidance (NOREF) increases both LT and SE, confirming that reference trajectories improve pace and control efficiency. Overall, the findings highlight reference-based and disturbance-aware planning, especially FLC, as effective tools for training and for achieving fast yet stable trajectories.

artificial intelligence, simulator, trajectory, (17 more...)

2512.04917

Country: Europe > Italy (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Ground > Road (1.00)
Leisure & Entertainment > Sports > Motorsports (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.46)

Jia, Honggang, Cheng, Nan, Wang, Xiucheng

RadioMapMotion: A Dataset and Baseline for Proactive Spatio-Temporal Radio Environment Prediction

arXiv.org Artificial IntelligenceNov-25-2025

Radio maps (RMs), which provide location-based pathloss estimations, are fundamental to enabling proactive, environment-aware communication in 6G networks. However, existing deep learning-based methods for RM construction often model dynamic environments as a series of independent static snapshots, thereby omitting the temporal continuity inherent in signal propagation changes caused by the motion of dynamic entities. To address this limitation, we propose the task of spatio-temporal RM prediction, which involves forecasting a sequence of future maps from historical observations. A key barrier to this predictive approach has been the lack of datasets capturing continuous environmental evolution. To fill this gap, we introduce RadioMapMotion, the first large-scale public dataset of continuous RM sequences generated from physically consistent vehicle trajectories. As a baseline for this task, we propose RadioLSTM, a UNet architecture based on Convolutional Long Short-Term Memory (ConvLSTM) and designed for multi-step sequence forecasting. Experimental evaluations show that RadioLSTM achieves higher prediction accuracy and structural fidelity compared to representative baseline methods. Furthermore, the model exhibits a low inference latency, indicating its potential suitability for real-time network operations. Our project will be publicly released at: https://github.com/UNIC-Lab/RadioMapMotion upon paper acceptance.

artificial intelligence, machine learning, modeling & simulation, (19 more...)

2511.17526

Genre: Research Report (0.40)

Industry: Transportation (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-17-2025

DualVision ArthroNav: Investigating Opportunities to Enhance Localization and Reconstruction in Image-based Arthroscopy Navigation via External Cameras

Shu, Hongchao, Seenivasan, Lalithkumar, Liu, Mingxu, Hwang, Yunseo, Ku, Yu-Chun, Knopf, Jonathan, Martin-Gomez, Alejandro, Armand, Mehran, Unberath, Mathias

Arthroscopic procedures can greatly benefit from navigation systems that enhance spatial awareness, depth perception, and field of view. However, existing optical tracking solutions impose strict workspace constraints and disrupt surgical workflow. Vision-based alternatives, though less invasive, often rely solely on the monocular arthroscope camera, making them prone to drift, scale ambiguity, and sensitivity to rapid motion or occlusion. We propose DualVision ArthroNav, a multi-camera arthroscopy navigation system that integrates an external camera rigidly mounted on the arthroscope. The external camera provides stable visual odometry and absolute localization, while the monocular arthroscope video enables dense scene reconstruction. By combining these complementary views, our system resolves the scale ambiguity and long-term drift inherent in monocular SLAM and ensures robust relocalization. Experiments demonstrate that our system effectively compensates for calibration errors, achieving an average absolute trajectory error of 1.09 mm. The reconstructed scenes reach an average target registration error of 2.16 mm, with high visual fidelity (SSIM = 0.69, PSNR = 22.19). These results indicate that our system provides a practical and cost-efficient solution for arthroscopic navigation, bridging the gap between optical tracking and purely vision-based systems, and paving the way toward clinically deployable, fully vision-based arthroscopic guidance.

artificial intelligence, dualvision arthronav system, reconstruction, (8 more...)

2511.10699

Country: North America > United States > Arkansas > Washington County > Fayetteville (0.15)

Genre: Research Report (0.65)

Industry: Health & Medicine > Surgery (0.65)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

arXiv.org Artificial IntelligenceNov-5-2025

LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling

Tang, Zecheng, Ji, Baibei, Qiu, Quantong, Wang, Haitian, Liang, Xiaobo, Li, Juntao, Zhang, Min

Reward model (RM) plays a pivotal role in aligning large language model (LLM) with human preferences. As real-world applications increasingly involve long history trajectories, e.g., LLM agent, it becomes indispensable to evaluate whether a model's responses are not only high-quality but also grounded in and consistent with the provided context. Yet, current RMs remain confined to short-context settings and primarily focus on response-level attributes (e.g., safety or helpfulness), while largely neglecting the critical dimension of long context-response consistency. In this work, we introduce Long-RewardBench, a benchmark specifically designed for long-context RM evaluation, featuring both Pairwise Comparison and Best-of-N tasks. Our preliminary study reveals that even state-of-the-art generative RMs exhibit significant fragility in long-context scenarios, failing to maintain context-aware preference judgments. Motivated by the analysis of failure patterns observed in model outputs, we propose a general multi-stage training strategy that effectively scales arbitrary models into robust Long-context RMs (LongRMs). Experiments show that our approach not only substantially improves performance on long-context evaluation but also preserves strong short-context capability. Notably, our 8B LongRM outperforms much larger 70B-scale baselines and matches the performance of the proprietary Gemini 2.5 Pro model.

arxiv preprint arxiv, large language model, machine learning, (21 more...)

2510.06915

Country:

Europe > Spain > Galicia > Madrid (0.04)
Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)
Asia > India (0.04)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)