Search
Leader Reward for POMO-Based Neural Combinatorial Optimization
Wang, Chaoyang, Cheng, Pengzhi, Li, Jingze, Sun, Weiwei
Deep neural networks based on reinforcement learning (RL) for solving combinatorial optimization (CO) problems are developing rapidly and have shown a tendency to approach or even outperform traditional solvers. However, existing methods overlook an important distinction: CO problems differ from other traditional problems in that they focus solely on the optimal solution provided by the model within a specific length of time, rather than considering the overall quality of all solutions generated by the model. In this paper, we propose Leader Reward and apply it during two different training phases of the Policy Optimization with Multiple Optima (POMO) [Kwon et al., 2020] model to enhance the model's ability to generate optimal solutions. This approach is applicable to a variety of CO problems, such as the Traveling Salesman Problem (TSP), the Capacitated Vehicle Routing Problem (CVRP), and the Flexible Flow Shop Problem (FFSP), but also works well with other POMO-based models or inference phase's strategies. We demonstrate that Leader Reward greatly improves the quality of the optimal solutions generated by the model. Specifically, we reduce the POMO's gap to the optimum by more than 100 times on TSP100 with almost no additional computational overhead.
Trading Volume Maximization with Online Learning
Cesari, Tommaso, Colomboni, Roberto
We explore brokerage between traders in an online learning framework. At any round $t$, two traders meet to exchange an asset, provided the exchange is mutually beneficial. The broker proposes a trading price, and each trader tries to sell their asset or buy the asset from the other party, depending on whether the price is higher or lower than their private valuations. A trade happens if one trader is willing to sell and the other is willing to buy at the proposed price. Previous work provided guidance to a broker aiming at enhancing traders' total earnings by maximizing the gain from trade, defined as the sum of the traders' net utilities after each interaction. In contrast, we investigate how the broker should behave to maximize the trading volume, i.e., the total number of trades. We model the traders' valuations as an i.i.d. process with an unknown distribution. If the traders' valuations are revealed after each interaction (full-feedback), and the traders' valuations cumulative distribution function (cdf) is continuous, we provide an algorithm achieving logarithmic regret and show its optimality up to constant factors. If only their willingness to sell or buy at the proposed price is revealed after each interaction ($2$-bit feedback), we provide an algorithm achieving poly-logarithmic regret when the traders' valuations cdf is Lipschitz and show that this rate is near-optimal. We complement our results by analyzing the implications of dropping the regularity assumptions on the unknown traders' valuations cdf. If we drop the continuous cdf assumption, the regret rate degrades to $\Theta(\sqrt{T})$ in the full-feedback case, where $T$ is the time horizon. If we drop the Lipschitz cdf assumption, learning becomes impossible in the $2$-bit feedback case.
Pure Planning to Pure Policies and In Between with a Recursive Tree Planner
A recursive tree planner (RTP) is designed to function as a pure planner without policies at one extreme and run a pure greedy policy at the other. In between, the RTP exploits policies to improve planning performance and improve zero-shot transfer from one class of planning problem to another. Policies are learned through imitation of the planner. These are then used by the planner to improve policies in a virtuous cycle. To improve planning performance and zero-shot transfer, the RTP incorporates previously learned tasks as generalized actions (GA) at any level of its hierarchy, and can refine those GA by adding primitive actions at any level too. For search, the RTP uses a generalized Dijkstra algorithm [Dijkstra 1959] which tries the greedy policy first and then searches over near-greedy paths and then farther away as necessary. The RPT can return multiple sub-goals from lower levels as well as boundary states near obstacles, and can exploit policies with background and object-number invariance. Policies at all levels of the hierarchy can be learned simultaneously or in any order or come from outside the framework. The RTP is tested here on a variety of Box2d [Cato 2022] problems, including the classic lunar lander [Farama 2022], and on the MuJoCo [Todorov et al 2012] inverted pendulum.
Explaining Expert Search and Team Formation Systems with ExES
Golzadeh, Kiarash, Golab, Lukasz, Szlichta, Jaroslaw
Expert search and team formation systems operate on collaboration networks, with nodes representing individuals, labeled with their skills, and edges denoting collaboration relationships. Given a keyword query corresponding to the desired skills, these systems identify experts that best match the query. However, state-of-the-art solutions to this problem lack transparency. To address this issue, we propose ExES, a tool designed to explain expert search and team formation systems using factual and counterfactual methods from the field of explainable artificial intelligence (XAI). ExES uses factual explanations to highlight important skills and collaborations, and counterfactual explanations to suggest new skills and collaborations to increase the likelihood of being identified as an expert. Towards a practical deployment as an interactive explanation tool, we present and experimentally evaluate a suite of pruning strategies to speed up the explanation search. In many cases, our pruning strategies make ExES an order of magnitude faster than exhaustive search, while still producing concise and actionable explanations.
GASE: Graph Attention Sampling with Edges Fusion for Solving Vehicle Routing Problems
Wang, Zhenwei, Bai, Ruibin, Khan, Fazlullah, Ozcan, Ender, Zhang, Tiehua
Learning-based methods have become increasingly popular for solving vehicle routing problems due to their near-optimal performance and fast inference speed. Among them, the combination of deep reinforcement learning and graph representation allows for the abstraction of node topology structures and features in an encoder-decoder style. Such an approach makes it possible to solve routing problems end-to-end without needing complicated heuristic operators designed by domain experts. Existing research studies have been focusing on novel encoding and decoding structures via various neural network models to enhance the node embedding representation. Despite the sophisticated approaches applied, there is a noticeable lack of consideration for the graph-theoretic properties inherent to routing problems. Moreover, the potential ramifications of inter-nodal interactions on the decision-making efficacy of the models have not been adequately explored. To bridge this gap, we propose an adaptive Graph Attention Sampling with the Edges Fusion framework (GASE),where nodes' embedding is determined through attention calculation from certain highly correlated neighbourhoods and edges, utilizing a filtered adjacency matrix. In detail, the selections of particular neighbours and adjacency edges are led by a multi-head attention mechanism, contributing directly to the message passing and node embedding in graph attention sampling networks. Furthermore, we incorporate an adaptive actor-critic algorithm with policy improvements to expedite the training convergence. We then conduct comprehensive experiments against baseline methods on learning-based VRP tasks from different perspectives. Our proposed model outperforms the existing methods by 2.08\%-6.23\% and shows stronger generalization ability, achieving state-of-the-art performance on randomly generated instances and real-world datasets.
CReMa: Crisis Response through Computational Identification and Matching of Cross-Lingual Requests and Offers Shared on Social Media
Lamsal, Rabindra, Read, Maria Rodriguez, Karunasekera, Shanika, Imran, Muhammad
During times of crisis, social media platforms play a vital role in facilitating communication and coordinating resources. Amidst chaos and uncertainty, communities often rely on these platforms to share urgent pleas for help, extend support, and organize relief efforts. However, the sheer volume of conversations during such periods, which can escalate to unprecedented levels, necessitates the automated identification and matching of requests and offers to streamline relief operations. This study addresses the challenge of efficiently identifying and matching assistance requests and offers on social media platforms during emergencies. We propose CReMa (Crisis Response Matcher), a systematic approach that integrates textual, temporal, and spatial features for multi-lingual request-offer matching. By leveraging CrisisTransformers, a set of pre-trained models specific to crises, and a cross-lingual embedding space, our methodology enhances the identification and matching tasks while outperforming strong baselines such as RoBERTa, MPNet, and BERTweet, in classification tasks, and Universal Sentence Encoder, Sentence Transformers in crisis embeddings generation tasks. We introduce a novel multi-lingual dataset that simulates scenarios of help-seeking and offering assistance on social media across the 16 most commonly used languages in Australia. We conduct comprehensive cross-lingual experiments across these 16 languages, also while examining trade-offs between multiple vector search strategies and accuracy. Additionally, we analyze a million-scale geotagged global dataset to comprehend patterns in relation to seeking help and offering assistance on social media. Overall, these contributions advance the field of crisis informatics and provide benchmarks for future research in the area.
A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers
Roth, Tom, Unanue, Inigo Jauregi, Abuadbba, Alsharif, Piccardi, Massimo
Text classifiers are vulnerable to adversarial examples -- correctly-classified examples that are deliberately transformed to be misclassified while satisfying acceptability constraints. The conventional approach to finding adversarial examples is to define and solve a combinatorial optimisation problem over a space of allowable transformations. While effective, this approach is slow and limited by the choice of transformations. An alternate approach is to directly generate adversarial examples by fine-tuning a pre-trained language model, as is commonly done for other text-to-text tasks. This approach promises to be much quicker and more expressive, but is relatively unexplored. For this reason, in this work we train an encoder-decoder paraphrase model to generate a diverse range of adversarial examples. For training, we adopt a reinforcement learning algorithm and propose a constraint-enforcing reward that promotes the generation of valid adversarial examples. Experimental results over two text classification datasets show that our model has achieved a higher success rate than the original paraphrase model, and overall has proved more effective than other competitive attacks. Finally, we show how key design choices impact the generated examples and discuss the strengths and weaknesses of the proposed approach.
Optimistic Query Routing in Clustering-based Approximate Maximum Inner Product Search
Bruch, Sebastian, Krishnan, Aditya, Nardini, Franco Maria
Clustering-based nearest neighbor search is a simple yet effective method in which data points are partitioned into geometric shards to form an index, and only a few shards are searched during query processing to find an approximate set of top-$k$ vectors. Even though the search efficacy is heavily influenced by the algorithm that identifies the set of shards to probe, it has received little attention in the literature. This work attempts to bridge that gap by studying the problem of routing in clustering-based maximum inner product search (MIPS). We begin by unpacking existing routing protocols and notice the surprising contribution of optimism. We then take a page from the sequential decision making literature and formalize that insight following the principle of ``optimism in the face of uncertainty.'' In particular, we present a new framework that incorporates the moments of the distribution of inner products within each shard to optimistically estimate the maximum inner product. We then present a simple instance of our algorithm that uses only the first two moments to reach the same accuracy as state-of-the-art routers such as \scann by probing up to $50%$ fewer points on a suite of benchmark MIPS datasets. Our algorithm is also space-efficient: we design a sketch of the second moment whose size is independent of the number of points and in practice requires storing only $O(1)$ additional vectors per shard.
Explainable Human-AI Interaction: A Planning Perspective
Sreedharan, Sarath, Kulkarni, Anagha, Kambhampati, Subbarao
From its inception, AI has had a rather ambivalent relationship with humans -- swinging between their augmentation and replacement. Now, as AI technologies enter our everyday lives at an ever increasing pace, there is a greater need for AI systems to work synergistically with humans. One critical requirement for such synergistic human-AI interaction is that the AI systems be explainable to the humans in the loop. To do this effectively, AI agents need to go beyond planning with their own models of the world, and take into account the mental model of the human in the loop. Drawing from several years of research in our lab, we will discuss how the AI agent can use these mental models to either conform to human expectations, or change those expectations through explanatory communication. While the main focus of the book is on cooperative scenarios, we will point out how the same mental models can be used for obfuscation and deception. Although the book is primarily driven by our own research in these areas, in every chapter, we will provide ample connections to relevant research from other groups.
Large Neighborhood Prioritized Search for Combinatorial Optimization with Answer Set Programming
Sugimori, Irumi, Inoue, Katsumi, Nabeshima, Hidetomo, Schaub, Torsten, Soh, Takehide, Tamura, Naoyuki, Banbara, Mutsunori
We propose Large Neighborhood Prioritized Search (LNPS) for solving combinatorial optimization problems in Answer Set Programming (ASP). LNPS is a metaheuristic that starts with an initial solution and then iteratively tries to find better solutions by alternately destroying and prioritized searching for a current solution. Due to the variability of neighborhoods, LNPS allows for flexible search without strongly depending on the destroy operators. We present an implementation of LNPS based on ASP. The resulting heulingo solver demonstrates that LNPS can significantly enhance the solving performance of ASP for optimization. Furthermore, we establish the competitiveness of our LNPS approach by empirically contrasting it to (adaptive) large neighborhood search.