AITopics | Search

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction

Neural Information Processing SystemsApr-25-2026, 06:35:46 GMT

Tree Search (TS) is crucial to some of the most influential successes in reinforcement learning. Here, we tackle two major challenges with TS that limit its usability: distribution shift and scalability. We first discover and analyze a counter-intuitive phenomenon: action selection through TS and a pre-trained value function often leads to lower performance compared to the original pre-trained agent, even when having access to the exact state and reward in future steps. We show this is due to a distribution shift to areas where value estimates are highly inaccurate and analyze this effect using Extreme Value theory. To overcome this problem, we introduce a novel off-policy correction term that accounts for the mismatch between the pre-trained value and its corresponding TS policy by penalizing under-sampled trajectories.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax Optimization

Neural Information Processing SystemsApr-25-2026, 06:34:42 GMT

We analyze the convergence rates of stochastic gradient algorithms for smooth finite-sum minimax optimization and show that, for many such algorithms, sampling the data points without replacement leads to faster convergence compared to sampling with replacement. For the smooth and strongly convex-strongly concave setting, we consider gradient descent ascent and the proximal point method, and present a unified analysis of two popular without-replacement sampling strategies, namely Random Reshuffling (RR), which shuffles the data every epoch, and Single Shuffling or Shuffle Once (SO), which shuffles only at the beginning. We obtain tight convergence rates for RR and SO and demonstrate that these strategies lead to faster convergence than uniform sampling. Moving beyond convexity, we obtain similar results for smooth nonconvex-nonconcave objectives satisfying a two-sided Polyak-Łojasiewicz inequality. Finally, we demonstrate that our techniques are general enough to analyze the effect of data-ordering attacks, where an adversary manipulates the order in which data points are supplied to the optimizer. Our analysis also recovers tight rates for the incremental gradient method, where the data points are not shuffled at all.

artificial intelligence, international conference, machine learning, (13 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

15d6717f8bb33b3a74df26ce1eee0b9a-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 05:47:56 GMT

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California (1.00)
Europe (0.93)

Genre: Research Report > New Finding (0.46)

Industry: Semiconductors & Electronics (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.67)

Add feedback

29539ed932d32f1c56324cded92c07c2-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 05:28:47 GMT

artificial intelligence, machine learning, matnet, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)

Add feedback

27e9661e033a73a6ad8cefcde965c54d-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 04:57:04 GMT

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania (0.28)

Genre: Research Report (0.68)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.47)

Add feedback

ACommunication-efficient Algorithm with Linear Convergence for Federated Minimax Learning

Neural Information Processing SystemsApr-25-2026, 04:20:59 GMT

In this paper, we study a large-scale multi-agent minimax optimization problem, which models many interesting applications in statistical learning and game theory, including Generative Adversarial Networks (GANs). The overall objective is a sum of agents' private local objective functions. We focus on the federated setting, where agents can perform local computation and communicate with a central server. Most existing federated minimax algorithms either require communication per iteration or lack performance guarantees with the exception of Local Stochastic Gradient Descent Ascent (SGDA), a multiple-local-update descent ascent algorithm which guarantees convergence under a diminishing stepsize. By analyzing Local SGDA under the ideal condition of no gradient noise, we show that generally it cannot guarantee exact convergence with constant stepsizes and thus suffers from slow rates of convergence. To tackle this issue, we propose FedGDA-GT, an improved Federated (Fed) Gradient Descent Ascent (GDA) method based on Gradient Tracking (GT).

artificial intelligence, machine learning, neural information processing system, (12 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.76)

Add feedback

22722a343513ed45f14905eb07621686-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 02:43:53 GMT

artificial intelligence, bayesian inference, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)

Add feedback

215a71a12769b056c3c32e7299f1c5ed-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 02:05:45 GMT

data mining, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Chess (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
(2 more...)

Add feedback

210b7ec74fc9cec6fb8388dbbdaf23f7-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 01:35:52 GMT

Cutting-plane methods have enabled remarkable successes in integer programming over the last few decades. State-of-the-art solvers integrate a myriad of cutting-plane techniques to speed up the underlying tree-search algorithm used to find optimal solutions. In this paper we provide sample complexity bounds for cut-selection in branch-and-cut (B&C). Given a training set of integer programs sampled from an application-specific input distribution and a family of cut selection policies, these guarantees bound the number of samples sufficient to ensure that using any policy in the family, the size of the tree B&C builds on average over the training set is close to the expected size of the tree B&C builds. We first bound the sample complexity of learning cutting planes from the canonical family of Chvátal-Gomory cuts. Our bounds handle any number of waves of any number of cuts and are fine tuned to the magnitudes of the constraint coefficients. Next, we prove sample complexity bounds for more sophisticated cut selection policies that use a combination of scoring rules to choose from a family of cuts. Finally, beyond the realm of cutting planes for integer programming, we develop a general abstraction of tree search that captures key components such as node selection and variable selection. For this abstraction, we bound the sample complexity of learning a good policy for building the search tree.

artificial intelligence, machine learning, sample complexity, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Industry: Leisure & Entertainment (0.37)

Technology: