dime
DIMES: A Differentiable Meta Solver for Combinatorial Optimization Problems
Recently, deep reinforcement learning (DRL) models have shown promising results in solving NP-hard Combinatorial Optimization (CO) problems. However, most DRL solvers can only scale to a few hundreds of nodes for combinatorial optimization problems on graphs, such as the Traveling Salesman Problem (TSP). This paper addresses the scalability challenge in large-scale combinatorial optimization by proposing a novel approach, namely, DIMES. Unlike previous DRL methods which suffer from costly autoregressive decoding or iterative refinements of discrete solutions, DIMES introduces a compact continuous space for parameterizing the underlying distribution of candidate solutions. Such a continuous space allows stable REINFORCE-based training and fine-tuning via massively parallel sampling. We further propose a meta-learning framework to enable the effective initialization of model parameters in the fine-tuning stage. Extensive experiments show that DIMES outperforms recent DRL-based methods on large benchmark datasets for Traveling Salesman Problems and Maximal Independent Set problems.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > United Kingdom > Scotland > City of Glasgow > Glasgow (0.04)
- Asia > Middle East > Israel (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > United Kingdom > Scotland > City of Glasgow > Glasgow (0.04)
- Asia > Middle East > Israel (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
DIME:Diffusion-Based Maximum Entropy Reinforcement Learning
Celik, Onur, Li, Zechu, Blessing, Denis, Li, Ge, Palanicek, Daniel, Peters, Jan, Chalvatzaki, Georgia, Neumann, Gerhard
Maximum entropy reinforcement learning (MaxEnt-RL) has become the standard approach to RL due to its beneficial exploration properties. Traditionally, policies are parameterized using Gaussian distributions, which significantly limits their representational capacity. Diffusion-based policies offer a more expressive alternative, yet integrating them into MaxEnt-RL poses challenges--primarily due to the intractability of computing their marginal entropy. To overcome this, we propose Diffusion-Based Maximum Entropy RL (DIME). DIME leverages recent advances in approximate inference with diffusion models to derive a lower bound on the maximum entropy objective. Additionally, we propose a policy iteration scheme that provably converges to the optimal diffusion policy. Our method enables the use of expressive diffusion-based policies while retaining the principled exploration benefits of MaxEnt-RL, significantly outperforming other diffusion-based methods on challenging high-dimensional control benchmarks. It is also competitive with state-of-the-art non-diffusion based RL methods while requiring fewer algorithmic design choices and smaller update-to-data ratios, reducing computational complexity.
- North America > United States (0.29)
- Europe > Germany (0.28)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
DIMES: A Differentiable Meta Solver for Combinatorial Optimization Problems
Recently, deep reinforcement learning (DRL) models have shown promising results in solving NP-hard Combinatorial Optimization (CO) problems. However, most DRL solvers can only scale to a few hundreds of nodes for combinatorial optimization problems on graphs, such as the Traveling Salesman Problem (TSP). This paper addresses the scalability challenge in large-scale combinatorial optimization by proposing a novel approach, namely, DIMES. Unlike previous DRL methods which suffer from costly autoregressive decoding or iterative refinements of discrete solutions, DIMES introduces a compact continuous space for parameterizing the underlying distribution of candidate solutions. Such a continuous space allows stable REINFORCE-based training and fine-tuning via massively parallel sampling.
10 of the year's most interesting auctions: Dinosaurs, coins, and Einstein's love letters
Some of 2024's most interesting science, technology, and history stories could be found in international auctions. Regardless of their final winning bids, each of the following items and artifacts are impressive in their own right. From AI-painted artwork to hunks of coal, these auction items highlight the wide range of not just artifacts from the past, but future-forward items, as well. If nearly 45 million sounds like a lot for a dinosaur skeleton to you, you aren't alone. Although billed as one of the "finest" known examples, a stegosaurus named "Apex" almost immediately drew controversy over the summer for a final bid that came in at over 10 times Sotheby's initial estimation.
- North America > United States > California > San Francisco County > San Francisco (0.06)
- North America > United States > Nevada > Carson City (0.05)
- North America > United States > Massachusetts (0.05)
Assassins Are Having a Moment. Netflix's Addictive New Hit Captures Their Dangerous Allure.
"I don't kill anyone who doesn't deserve it," says Sam (Ben Whishaw), the self-described "triggerman"--hit man--in the new Netflix spy thriller Black Doves. Sam, like the series' other main character, Helen (not her real name, played by Keira Knightley), works for Black Doves' eponymous organization. They are spies, more or less, but spies for hire, and when you get right down to it, most of Sam's gigs seem to be carrying out hits for drug dealers. Sam isn't the only hit man featured in a sleek, starry TV thriller this winter. On Peacock, Eddie Redmayne plays Alex in a new adaptation of Frederick Forsyth's 1971 novel The Day of the Jackal.
- Media > Television (0.61)
- Media > Film (0.61)
- Information Technology > Services (0.61)
Enhancing Mathematical Reasoning in LLMs by Stepwise Correction
Wu, Zhenyu, Zeng, Qingkai, Zhang, Zhihan, Tan, Zhaoxuan, Shen, Chao, Jiang, Meng
Best-of-N decoding methods instruct large language models (LLMs) to generate multiple solutions, score each using a scoring function, and select the highest scored as the final answer to mathematical reasoning problems. However, this repeated independent process often leads to the same mistakes, making the selected solution still incorrect. We propose a novel prompting method named Stepwise Correction (StepCo) that helps LLMs identify and revise incorrect steps in their generated reasoning paths. It iterates verification and revision phases that employ a process-supervised verifier. The verify-then-revise process not only improves answer correctness but also reduces token consumption with fewer paths needed to generate. With StepCo, a series of LLMs demonstrate exceptional performance. Notably, using GPT-4o as the backend LLM, StepCo achieves an average accuracy of 94.1 across eight datasets, significantly outperforming the state-of-the-art Best-of-N method by +2.4, while reducing token consumption by 77.8%.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Thailand (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (6 more...)
- Research Report (1.00)
- Workflow (0.82)
- Leisure & Entertainment (0.46)
- Education > Educational Setting (0.45)