AITopics

Industry: Transportation (0.76)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.48)

Neural Information Processing SystemsApr-26-2026, 00:06:00 GMT

564127c03caab942e503ee6f810f54fd-Paper.pdf

artificial intelligence, machine learning, survey article, (18 more...)

Genre:

Research Report (0.68)
Overview (0.46)

Industry: Transportation (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Communications > Networks (0.68)

Neural Information Processing SystemsFeb-8-2026, 18:16:50 GMT

564127c03caab942e503ee6f810f54fd-Supplemental.pdf

experiment, node, reviser, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Neural Information Processing SystemsFeb-8-2026, 18:16:46 GMT

LearningCollaborativePoliciestoSolveNP-hard RoutingProblems

The seeder generates as diversified candidate solutions as possible (seeds) while being dedicated to exploring over the full combinatorial action space (i.e.,sequence ofassignment action).

artificial intelligence, machine learning, reviser, (14 more...)

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
Africa > Ethiopia (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)

Neural Information Processing SystemsDec-24-2025, 03:41:49 GMT

Learning Collaborative Policies to Solve NP-hard Routing Problems

Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge. Although DRL can be used to solve complex problems, DRL frameworks still struggle to compete with state-of-the-art heuristics showing a substantial performance gap. This paper proposes a novel hierarchical problem-solving strategy, termed learning collaborative policies (LCP), which can effectively find the near-optimum solution using two iterative DRL policies: the seeder and reviser. The seeder generates as diversified candidate solutions as possible (seeds) while being dedicated to exploring over the full combinatorial action space (i.e., sequence of assignment action). To this end, we train the seeder's policy using a simple yet effective entropy regularization reward to encourage the seeder to find diverse solutions. On the other hand, the reviser modifies each candidate solution generated by the seeder; it partitions the full trajectory into sub-tours and simultaneously revises each sub-tour to minimize its traveling distance. Thus, the reviser is trained to improve the candidate solution's quality, focusing on the reduced solution space (which is beneficial for exploitation). Extensive experiments demonstrate that the proposed two-policies collaboration scheme improves over single-policy DRL framework on various NP-hard routing problems, including TSP, prize collecting TSP (PCTSP), and capacitated vehicle routing problem (CVRP).

learning collaborative policy, name change, solve np-hard routing problem, (6 more...)

Industry: Transportation (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)

Neural Information Processing SystemsOct-10-2024, 11:43:40 GMT

Learning Collaborative Policies to Solve NP-hard Routing Problems

Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge. Although DRL can be used to solve complex problems, DRL frameworks still struggle to compete with state-of-the-art heuristics showing a substantial performance gap. This paper proposes a novel hierarchical problem-solving strategy, termed learning collaborative policies (LCP), which can effectively find the near-optimum solution using two iterative DRL policies: the seeder and reviser. The seeder generates as diversified candidate solutions as possible (seeds) while being dedicated to exploring over the full combinatorial action space (i.e., sequence of assignment action). To this end, we train the seeder's policy using a simple yet effective entropy regularization reward to encourage the seeder to find diverse solutions.

candidate solution, learning collaborative policy, solve np-hard routing problem, (3 more...)

Industry: Transportation (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.62)

arXiv.org Artificial IntelligenceJul-18-2024

Translate-and-Revise: Boosting Large Language Models for Constrained Translation

Huang, Pengcheng, Mu, Yongyu, Wu, Yuzhang, Li, Bei, Xiao, Chunyang, Xiao, Tong, Zhu, Jingbo

Imposing constraints on machine translation systems presents a challenging issue because these systems are not trained to make use of constraints in generating adequate, fluent translations. In this paper, we leverage the capabilities of large language models (LLMs) for constrained translation, given that LLMs can easily adapt to this task by taking translation instructions and constraints as prompts. However, LLMs cannot always guarantee the adequacy of translation, and, in some cases, ignore the given constraints. This is in part because LLMs might be overly confident in their predictions, overriding the influence of the constraints. To overcome this overiding behaviour, we propose to add a revision process that encourages LLMs to correct the outputs by prompting them about the constraints that have not yet been met. We evaluate our approach on four constrained translation tasks, encompassing both lexical and structural constraints in multiple constraint domains. Experiments show 15\% improvement in constraint-based translation accuracy over standard LLMs and the approach also significantly outperforms neural machine translation (NMT) state-of-the-art methods.

constraint, proceedings, translation, (13 more...)

2407.13164

Country:

Asia > China > Liaoning Province > Shenyang (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

arXiv.org Artificial IntelligenceJun-16-2024

Self-Evolution Fine-Tuning for Policy Optimization

Chen, Ruijun, Liang, Jiehao, Gao, Shiping, Wan, Fanqi, Quan, Xiaojun

The alignment of large language models (LLMs) is crucial not only for unlocking their potential in specific tasks but also for ensuring that responses meet human expectations and adhere to safety and ethical principles. Current alignment methodologies face considerable challenges. For instance, supervised fine-tuning (SFT) requires extensive, high-quality annotated samples, while reinforcement learning from human feedback (RLHF) is complex and often unstable. In this paper, we introduce self-evolution fine-tuning (SEFT) for policy optimization, with the aim of eliminating the need for annotated samples while retaining the stability and efficiency of SFT. SEFT first trains an adaptive reviser to elevate low-quality responses while maintaining high-quality ones. The reviser then gradually guides the policy's optimization by fine-tuning it with enhanced responses. One of the prominent features of this method is its ability to leverage unlimited amounts of unannotated data for policy optimization through supervised fine-tuning. Our experiments on AlpacaEval 2.0 and MT-Bench demonstrate the effectiveness of SEFT. We also provide a comprehensive analysis of its advantages over existing alignment techniques.

adaptive reviser, optimization, reviser, (14 more...)

2406.10813

Country:

Europe > France (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceDec-13-2023

GLOP: Learning Global Partition and Local Construction for Solving Large-scale Routing Problems in Real-time

Ye, Haoran, Wang, Jiarui, Liang, Helan, Cao, Zhiguang, Li, Yong, Li, Fanzhang

The recent end-to-end neural solvers have shown promise for small-scale routing problems but suffered from limited real-time scaling-up performance. This paper proposes GLOP (Global and Local Optimization Policies), a unified hierarchical framework that efficiently scales toward large-scale routing problems. GLOP partitions large routing problems into Travelling Salesman Problems (TSPs) and TSPs into Shortest Hamiltonian Path Problems. For the first time, we hybridize non-autoregressive neural heuristics for coarse-grained problem partitions and autoregressive neural heuristics for fine-grained route constructions, leveraging the scalability of the former and the meticulousness of the latter. Experimental results show that GLOP achieves competitive and state-of-the-art real-time performance on large-scale routing problems, including TSP, ATSP, CVRP, and PCTSP.

glop, node, solver, (15 more...)

2312.08224

Country:

Asia > China (0.04)
Asia > Singapore (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Industry: Transportation (1.00)

Technology:

Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
(2 more...)

Ferreira, Tiago, Henry, Léo, da Silva, Raquel Fernandes, Silva, Alexandra

Conflict-Aware Active Automata Learning

arXiv.org Artificial IntelligenceOct-2-2023

Formal methods have a long history of success in the analysis of critical systems through abstract models. These methods are rapidly expanding their range of applications and recent years saw an increase in industrial teams applying them to (large-scale) software [6, 9, 10, 12, 13, 25]. The applicability of such methods is limited by the availability of good models, which require time and expert knowledge to be hand-crafted and updated. To overcome this issue, a research area on automatic inference of models, called model learning [32], has gained popularity. Broadly, there are two classes of model learning: passive learning, which attempts to infer a formal model from a static log, and active learning, where interaction with the system is allowed to refine knowledge during the inference. In this paper, we focus on active learning, motivated by its successful use in verification tasks, e.g. in analyzing network protocol implementations, as TCP [18], SSH [19], and QUIC [15], or understanding the timing behavior of modern CPUs [34]. Current state-of-the-art active learning algorithms rely on the Minimally Adequate Teacher (MAT) framework [4], which formalizes a process with two agents: a learner and a teacher. The learner tries to infer a formal model of a system, and the teacher is omniscient of the system, being able to answer queries on potential behaviors and the correctness of the learned model. MAT assumes that the interactions between both agents are perfect and deterministic. Learning In Practice Interactions with the System Under Learning (SUL) are often non-deterministic in some way, e.g. the communications can be noisy (i.e.

conflict, noise, query, (13 more...)