dest
LLMs Process Lists With General Filter Heads
Sharma, Arnab Sen, Rogers, Giordano, Shapira, Natalie, Bau, David
We investigate the mechanisms underlying a range of list-processing tasks in LLMs, and we find that LLMs have learned to encode a compact, causal representation of a general filtering operation that mirrors the generic "filter" function of functional programming. Using causal mediation analysis on a diverse set of list-processing tasks, we find that a small number of attention heads, which we dub filter heads, encode a compact representation of the filtering predicate in their query states at certain tokens. We demonstrate that this predicate representation is general and portable: it can be extracted and reapplied to execute the same filtering operation on different collections, presented in different formats, languages, or even in tasks. However, we also identify situations where transformer LMs can exploit a different strategy for filtering: eagerly evaluating if an item satisfies the predicate and storing this intermediate result as a flag directly in the item representations. Our results reveal that transformer LMs can develop human-interpretable implementations of abstract computational operations that generalize in ways that are surprisingly similar to strategies used in traditional functional programming patterns.
Holistic Semantic Representation for Navigational Trajectory Generation
Cao, Ji, Zheng, Tongya, Guo, Qinghong, Wang, Yu, Dai, Junshu, Liu, Shunyu, Yang, Jie, Song, Jie, Song, Mingli
Trajectory generation has garnered significant attention from researchers in the field of spatio-temporal analysis, as it can generate substantial synthesized human mobility trajectories that enhance user privacy and alleviate data scarcity. However, existing trajectory generation methods often focus on improving trajectory generation quality from a singular perspective, lacking a comprehensive semantic understanding across various scales. Consequently, we are inspired to develop a HOlistic SEmantic Representation (HOSER) framework for navigational trajectory generation. Given an origin-and-destination (OD) pair and the starting time point of a latent trajectory, we first propose a Road Network Encoder to expand the receptive field of road- and zone-level semantics. Second, we design a Multi-Granularity Trajectory Encoder to integrate the spatio-temporal semantics of the generated trajectory at both the point and trajectory levels. Finally, we employ a Destination-Oriented Navigator to seamlessly integrate destination-oriented guidance. Extensive experiments on three real-world datasets demonstrate that HOSER outperforms state-of-the-art baselines by a significant margin. Moreover, the model's performance in few-shot learning and zero-shot learning scenarios further verifies the effectiveness of our holistic semantic representation.
Transformers, parallel computation, and logarithmic depth
Sanford, Clayton, Hsu, Daniel, Telgarsky, Matus
The transformer (Vaswani et al., 2017) has emerged as the dominant neural architecture for many sequential modeling tasks such as machine translation (Radford et al., 2019) and protein folding (Jumper et al., 2021). Reasons for the success of transformers include suitability to modern hardware and training stability: unlike in recurrent models, inference and training can be efficiently parallelized, and training is less vulnerable to vanishing and exploding gradients. However, the advantages of transformers over other neural architectures can be understood more fundamentally via the lens of representation, which regards neural nets as parameterized functions and asks what they can efficiently compute. Many previous theoretical studies of transformers establish (approximation-theoretic and computational) universality properties, but only at large model sizes (Yun et al., 2020; Pérez et al., 2021). These results are not unique to transformers and reveal little about which tasks can be solved in a size-efficient manner.
Embracing Background Knowledge in the Analysis of Actual Causality: An Answer Set Programming Approach
Gelfond, Michael, Fandinno, Jorge, Balai, Evgenii
This paper presents a rich knowledge representation language aimed at formalizing causal knowledge. This language is used for accurately and directly formalizing common benchmark examples from the literature of actual causality. A definition of cause is presented and used to analyze the actual causes of changes with respect to sequences of actions representing those examples.
HMD-AMP: Protein Language-Powered Hierarchical Multi-label Deep Forest for Annotating Antimicrobial Peptides
Yu, Qinze, Dong, Zhihang, Fan, Xingyu, Zong, Licheng, Li, Yu
Identifying the targets of an antimicrobial peptide is a fundamental step in studying the innate immune response and combating antibiotic resistance, and more broadly, precision medicine and public health. There have been extensive studies on the statistical and computational approaches to identify (i) whether a peptide is an antimicrobial peptide (AMP) or a non-AMP and (ii) which targets are these sequences effective to (Gram-positive, Gram-negative, etc.). Despite the existing deep learning methods on this problem, most of them are unable to handle the small AMP classes (anti-insect, anti-parasite, etc.). And more importantly, some AMPs can have multiple targets, which the previous methods fail to consider. In this study, we build a diverse and comprehensive multi-label protein sequence database by collecting and cleaning amino acids from various AMP databases. To generate efficient representations and features for the small classes dataset, we take advantage of a protein language model trained on 250 million protein sequences. Based on that, we develop an end-to-end hierarchical multi-label deep forest framework, HMD-AMP, to annotate AMP comprehensively. After identifying an AMP, it further predicts what targets the AMP can effectively kill from eleven available classes. Extensive experiments suggest that our framework outperforms state-of-the-art models in both the binary classification task and the multi-label classification task, especially on the minor classes.The model is robust against reduced features and small perturbations and produces promising results. We believe HMD-AMP contributes to both the future wet-lab investigations of the innate structural properties of different antimicrobial peptides and build promising empirical underpinnings for precise medicine with antibiotics.
Optimal Solving of Constrained Path-Planning Problems with Graph Convolutional Networks and Optimized Tree Search
Osanlou, Kevin, Bursuc, Andrei, Guettier, Christophe, Cazenave, Tristan, Jacopin, Eric
Learning-based methods are growing prominence for planning purposes. However, there are very few approaches for learning-assisted constrained path-planning on graphs, while there are multiple downstream practical applications. This is the case for constrained path-planning for Autonomous Unmanned Ground Vehicles (AUGV), typically deployed in disaster relief or search and rescue applications. In off-road environments, the AUGV must dynamically optimize a source-destination path under various operational constraints, out of which several are difficult to predict in advance and need to be addressed on-line. We propose a hybrid solving planner that combines machine learning models and an optimal solver. More specifically, a graph convolutional network (GCN) is used to assist a branch and bound (B&B) algorithm in handling the constraints. We conduct experiments on realistic scenarios and show that GCN support enables substantial speedup and smoother scaling to harder problems.
Face-work for Human-Agent Joint Decision-Making
We propose a method to integrate face-work, a common social ritual related to trust, into a decision-making agent that works collaboratively with a human. Face-work is a set of trust-building behaviors designed to "save face" or prevent others from "losing face." This paper describes the design of a decision-making process that explicitly considers face-work as part of its action selection. We also present a simulated robot arm deployed in an online environment that can be used to evaluate the proposed method.
Optimal Solutions to Large Logistics Planning Domain Problems
Paul, Gerald (Boston University) | Röger, Gabriele (University of Basel) | Keller, Thomas (University of Basel) | Helmert, Malte (University of Basel)
We propose techniques for efficiently determining optimal solutions to large logistics planning domain problems. We map a problem instance to a directed graph and show that no more than one vehicle per weakly connected component of the graph is needed for an optimal solution. We propose techniques for efficiently finding the vehicles which must be employed for an optimal solution. Also we develop a strong admissible heuristic based on the analysis of a directed graph, the cycles of which represent situations in the problem state in which a vehicle must visit a location more than once. To the best of our knowledge, ours is the first method that determines optimal solutions for large logistics instances (including the largest instances in the IPC 1998 and IPC 2000 problem sets).