AITopics | remax

e8e30fda5ab87ea93360a36288ac0145-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 03:55:03 GMT

machine learning, natural language, segmentation, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Sensing and Signal Processing > Image Processing (0.69)
(2 more...)

Add feedback

ReMaX: Relaxing for Better Training on Efficient Panoptic Segmentation

Neural Information Processing SystemsFeb-17-2026, 17:48:31 GMT

This paper presents a new mechanism to facilitate the training of mask transformers for efficient panoptic segmentation, democratizing its deployment. We observe that due to the high complexity in the training objective of panoptic segmentation, it will inevitably lead to much higher penalization on false positive.

artificial intelligence, machine learning, segmentation, (17 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

ReMaX: Relaxing for Better Training on Efficient Panoptic Segmentation

Neural Information Processing SystemsDec-27-2025, 02:30:50 GMT

This paper presents a new mechanism to facilitate the training of mask transformers for efficient panoptic segmentation, democratizing its deployment. We observe that due to the high complexity in the training objective of panoptic segmentation, it will inevitably lead to much higher penalization on false positive. Such unbalanced loss makes the training process of the end-to-end mask-transformer based architectures difficult, especially for efficient models. In this paper, we present ReMaX that adds relaxation to mask predictions and class predictions during the training phase for panoptic segmentation. We demonstrate that via these simple relaxation techniques during training, our model can be consistently improved by a clear margin without any extra computational cost on inference. By combining our method with efficient backbones like MobileNetV3-Small, our method achieves new state-of-the-art results for efficient panoptic segmentation on COCO, ADE20K and Cityscapes.

better training, efficient panoptic segmentation, panoptic segmentation, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Outcome-based Reinforcement Learning to Predict the Future

Turtel, Benjamin, Franklin, Danny, Skotheim, Kris, Hewitt, Luke, Schoenegger, Philipp

arXiv.org Artificial IntelligenceDec-2-2025

Reinforcement Learning with Verifiable Rewards (RLVR) has been an effective approach for improving Large Language Models' reasoning in domains such as coding and mathematics. Here, we apply RLVR methods towards forecasting future real-world events - a challenging task for RL due to the very noisy (and delayed) outcomes involved. Using a novel dataset of recent questions from a prediction market, and accompanying relevant news headlines, we show that a compact (14B) reasoning model can be trained to match or surpass the predictive accuracy of frontier models like o1, while greatly improving probabilistic calibration. The model's performance is also practically meaningful: in a Polymarket trading simulation, we estimate that its bets would have yielded a return on investment of over 10% across all questions in the test set. We detail and compare approaches used in training our model, including augmenting our training-data with synthetic prediction questions, guardrails for learning stability, and median prediction sampling at inference-time.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2505.17989

Country: Europe (0.14)

Genre: Research Report (1.00)

Industry: Banking & Finance > Trading > Prediction Market (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

ReMaX: Relaxing for Better Training on Efficient Panoptic Segmentation

Neural Information Processing SystemsJan-20-2025, 01:28:44 GMT

This paper presents a new mechanism to facilitate the training of mask transformers for efficient panoptic segmentation, democratizing its deployment. We observe that due to the high complexity in the training objective of panoptic segmentation, it will inevitably lead to much higher penalization on false positive. Such unbalanced loss makes the training process of the end-to-end mask-transformer based architectures difficult, especially for efficient models. In this paper, we present ReMaX that adds relaxation to mask predictions and class predictions during the training phase for panoptic segmentation. We demonstrate that via these simple relaxation techniques during training, our model can be consistently improved by a clear margin without any extra computational cost on inference.

better training, efficient panoptic segmentation, panoptic segmentation, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

Li, Ziniu, Xu, Tian, Zhang, Yushun, Lin, Zhihang, Yu, Yang, Sun, Ruoyu, Luo, Zhi-Quan

arXiv.org Artificial IntelligenceDec-16-2023

Alignment is crucial for training large language models. The predominant strategy is Reinforcement Learning from Human Feedback (RLHF), with Proximal Policy Optimization (PPO) as the de-facto algorithm. Yet, PPO is known to struggle with computational inefficiency, a challenge that this paper aims to address. We identify three important properties of RLHF tasks: fast simulation, deterministic transitions, and trajectory-level rewards, which are not leveraged in PPO. Based on these properties, we develop ReMax, a new algorithm tailored for RLHF. The design of ReMax builds on the celebrated algorithm REINFORCE but is enhanced with a new variance-reduction technique. ReMax offers threefold advantages over PPO: first, it is simple to implement with just 6 lines of code. It further eliminates more than 4 hyper-parameters in PPO, which are laborious to tune. Second, ReMax reduces memory usage by about 50%. To illustrate, PPO runs out of memory when fine-tuning a Llama2-7B model on A100-80GB GPUs, whereas ReMax can support the training. Even though memory-efficient techniques (e.g., ZeRO and offload) are employed for PPO to afford training, ReMax can utilize a larger batch size to increase throughput. Third, in terms of wall-clock time, PPO is about twice as slow as ReMax per iteration. Importantly, these improvements do not sacrifice task performance. We hypothesize that these advantages can be maintained in larger-scale models.

algorithm, arxiv preprint arxiv, remax, (14 more...)

arXiv.org Artificial Intelligence

2310.10505

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.93)
Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

REMAX: Relational Representation for Multi-Agent Exploration

Ryu, Heechang, Shin, Hayong, Park, Jinkyoo

arXiv.org Artificial IntelligenceAug-12-2020

Training a multi-agent reinforcement learning (MARL) model is generally difficult because there are numerous combinations of complex interactions among agents that induce certain reward signals. Especially when there is a sparse reward signal, the training becomes more difficult. Previous studies have tried to resolve this issue by employing an intrinsic reward, which is a signal specifically designed for inducing the interactions among agents, to boost the MARL model training. However, this approach requires extensive prior knowledge to design an intrinsic reward. To optimize the training of an MARL model, we propose a learning-based exploration strategy to generate the initial states of a game. The proposed method adopts a variational graph autoencoder to represent a state of a game such that (1) the state can be compactly encoded to the latent representation by considering the relationship among agents, and (2) the latent representation can be used as an effective input to the surrogate model predicting the exploration score. The proposed method determines the latent representations that maximize the surrogate model and decodes these representations to generate the initial states from which the MARL model starts training. Empirically, we demonstrate that the generated states improve the training and performance of MARL more than the existing exploration methods.

agent, neural network, upstream oil & gas, (19 more...)

arXiv.org Artificial Intelligence

2008.05214

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology: