Goto

Collaborating Authors

 Instructional Material


Multimodal Pretrained Models for Sequential Decision-Making: Synthesis, Verification, Grounding, and Perception

arXiv.org Artificial Intelligence

Recently developed pretrained models can encode rich world knowledge expressed in multiple modalities, such as text and images. However, the outputs of these models cannot be integrated into algorithms to solve sequential decision-making tasks. We develop an algorithm that utilizes the knowledge from pretrained models to construct and verify controllers for sequential decision-making tasks, and to ground these controllers to task environments through visual observations. In particular, the algorithm queries a pretrained model with a user-provided, text-based task description and uses the model's output to construct an automaton-based controller that encodes the model's task-relevant knowledge. It then verifies whether the knowledge encoded in the controller is consistent with other independently available knowledge, which may include abstract information on the environment or user-provided specifications. If this verification step discovers any inconsistency, the algorithm automatically refines the controller to resolve the inconsistency. Next, the algorithm leverages the vision and language capabilities of pretrained models to ground the controller to the task environment. It collects image-based observations from the task environment and uses the pretrained model to link these observations to the text-based control logic encoded in the controller (e.g., actions and conditions that trigger the actions). We propose a mechanism to ensure the controller satisfies the user-provided specification even when perceptual uncertainties are present. We demonstrate the algorithm's ability to construct, verify, and ground automaton-based controllers through a suite of real-world tasks, including daily life and robot manipulation tasks.


Adapt and Decompose: Efficient Generalization of Text-to-SQL via Domain Adapted Least-To-Most Prompting

arXiv.org Artificial Intelligence

Cross-domain and cross-compositional generalization of Text-to-SQL semantic parsing is a challenging task. Existing Large Language Model (LLM) based solutions rely on inference-time retrieval of few-shot exemplars from the training set to synthesize a run-time prompt for each Natural Language (NL) test query. In contrast, we devise an algorithm which performs offline sampling of a minimal set-of few-shots from the training data, with complete coverage of SQL clauses, operators and functions, and maximal domain coverage within the allowed token length. This allows for synthesis of a fixed Generic Prompt (GP), with a diverse set-of exemplars common across NL test queries, avoiding expensive test time exemplar retrieval. We further auto-adapt the GP to the target database domain (DA-GP), to better handle cross-domain generalization; followed by a decomposed Least-To-Most-Prompting (LTMP-DA-GP) to handle cross-compositional generalization. The synthesis of LTMP-DA-GP is an offline task, to be performed one-time per new database with minimal human intervention. Our approach demonstrates superior performance on the KaggleDBQA dataset, designed to evaluate generalizability for the Text-to-SQL task. We further showcase consistent performance improvement of LTMP-DA-GP over GP, across LLMs and databases of KaggleDBQA, highlighting the efficacy and model agnostic benefits of our prompt based adapt and decompose approach.


Path Signatures for Diversity in Probabilistic Trajectory Optimisation

arXiv.org Artificial Intelligence

Abstract-- Motion planning can be cast as a trajectory optimisation problem where a cost is minimised as a function of the trajectory being generated. In complex environments with several obstacles and complicated geometry, this optimisation problem is usually difficult to solve and prone to local minima. However, recent advancements in computing hardware allow for parallel trajectory optimisation where multiple solutions are obtained simultaneously, each initialised from a different starting point. Unfortunately, without a strategy preventing two solutions to collapse on each other, naive parallel optimisation can suffer from mode collapse diminishing the efficiency of the approach and the likelihood of finding a global solution. In this paper we leverage on recent advances in the theory of rough paths to devise an algorithm for parallel trajectory optimisation that promotes diversity over the range of solutions, therefore avoiding mode collapses and achieving better global properties. These can be roughly divided into two main paradigms: sampling-based and trajectory optimisation algorithms. Sampling-based planning [2] is a class of planners with Trajectory optimisation is one of the key tools in robotic probabilistically complete and asymptotically optimal guarantees motion, used to find control signals or paths in obstaclecluttered [3]. These approaches decompose the planning problem environments that allow the robot to perform into a series of sequential decision-making problems with desired tasks. These trajectories can represent a variety of a tree-based [4] or graph-based [5], [6] approach.


Improving Performance in Continual Learning Tasks using Bio-Inspired Architectures

arXiv.org Artificial Intelligence

The ability to learn continuously from an incoming data stream without catastrophic forgetting is critical to designing intelligent systems. Many approaches to continual learning rely on stochastic gradient descent and its variants that employ global error updates, and hence need to adopt strategies such as memory buffers or replay to circumvent its stability, greed, and short-term memory limitations. To address this limitation, we have developed a biologically inspired lightweight neural network architecture that incorporates synaptic plasticity mechanisms and neuromodulation and hence learns through local error signals to enable online continual learning without stochastic gradient descent. Our approach leads to superior online continual learning performance on Split-MNIST, Split-CIFAR-10, and Split-CIFAR-100 datasets compared to other memory-constrained learning approaches and matches that of the state-of-the-art memory-intensive replay-based approaches. We further demonstrate the effectiveness of our approach by integrating key design concepts into other backpropagation-based continual learning algorithms, significantly improving their accuracy. Our results provide compelling evidence for the importance of incorporating biological principles into machine learning models and offer insights into how we can leverage them to design more efficient and robust systems for online continual learning. Online continual learning addresses the scenario where a system has to learn and process data that are continuously streamed, often without restrictions in terms of the distribution of data within and across tasks and without clearly identified task boundaries Mai et al. (2021); Chen et al. (2020); Aljundi et al. (2019a). Online continual learning algorithms seek to mitigate catastrophic forgetting at both the data-instance and task level Chen et al. (2020). In some cases, however, such as on-chip learning at the edge, additional considerations such as resource limitations in the hardware, data privacy, or data security are also important for online continual learning. A key challenge of online continual learning is that it runs counter to the optimal conditions required for optimization using stochastic gradient descent (SGD) Parisi et al. (2019), which struggles with non-stationary data streams Lindsey & Litwin-Kumar (2020). On the contrary, biological systems excel at online continual learning. Inspired by the structure and functionality of the mammal brain, several approaches have adopted replay strategies to counteract catastrophic forgetting during non-stationary tasks.


Decentralization and Acceleration Enables Large-Scale Bundle Adjustment

arXiv.org Artificial Intelligence

Scaling to arbitrarily large bundle adjustment problems requires data and compute to be distributed across multiple devices. Centralized methods in prior works are only able to solve small or medium size problems due to overhead in computation and communication. In this paper, we present a fully decentralized method that alleviates computation and communication bottlenecks to solve arbitrarily large bundle adjustment problems. We achieve this by reformulating the reprojection error and deriving a novel surrogate function that decouples optimization variables from different devices. This function makes it possible to use majorization minimization techniques and reduces bundle adjustment to independent optimization subproblems that can be solved in parallel. We further apply Nesterov's acceleration and adaptive restart to improve convergence while maintaining its theoretical guarantees. Despite limited peer-to-peer communication, our method has provable convergence to first-order critical points under mild conditions. On extensive benchmarks with public datasets, our method converges much faster than decentralized baselines with similar memory usage and communication load. Compared to centralized baselines using a single device, our method, while being decentralized, yields more accurate solutions with significant speedups of up to 953.7x over Ceres and 174.6x over DeepLM. Code: https://joeaortiz.github.io/daba.


Current and Future Challenges in Knowledge Representation and Reasoning

arXiv.org Artificial Intelligence

Knowledge Representation and Reasoning is a central, longstanding, and active area of Artificial Intelligence. Over the years it has evolved significantly; more recently it has been challenged and complemented by research in areas such as machine learning and reasoning under uncertainty. In July 2022 a Dagstuhl Perspectives workshop was held on Knowledge Representation and Reasoning. The goal of the workshop was to describe the state of the art in the field, including its relation with other areas, its shortcomings and strengths, together with recommendations for future progress. We developed this manifesto based on the presentations, panels, working groups, and discussions that took place at the Dagstuhl Workshop. It is a declaration of our views on Knowledge Representation: its origins, goals, milestones, and current foci; its relation to other disciplines, especially to Artificial Intelligence; and on its challenges, along with key priorities for the next decade.


Machine Learning for Infectious Disease Risk Prediction: A Survey

arXiv.org Artificial Intelligence

Infectious diseases, either emerging or long-lasting, place numerous people at risk and bring heavy public health burdens worldwide. In the process against infectious diseases, predicting the epidemic risk by modeling the disease transmission plays an essential role in assisting with preventing and controlling disease transmission in a more effective way. In this paper, we systematically describe how machine learning can play an essential role in quantitatively characterizing disease transmission patterns and accurately predicting infectious disease risks. First, we introduce the background and motivation of using machine learning for infectious disease risk prediction. Next, we describe the development and components of various machine learning models for infectious disease risk prediction. Specifically, existing models fall into three categories: Statistical prediction, data-driven machine learning, and epidemiology-inspired machine learning. Subsequently, we discuss challenges encountered when dealing with model inputs, designing task-oriented objectives, and conducting performance evaluation. Finally, we conclude with a discussion of open questions and future directions.


DaMSTF: Domain Adversarial Learning Enhanced Meta Self-Training for Domain Adaptation

arXiv.org Artificial Intelligence

Self-training emerges as an important research line on domain adaptation. By taking the model's prediction as the pseudo labels of the unlabeled data, self-training bootstraps the model with pseudo instances in the target domain. However, the prediction errors of pseudo labels (label noise) challenge the performance of self-training. To address this problem, previous approaches only use reliable pseudo instances, i.e., pseudo instances with high prediction confidence, to retrain the model. Although these strategies effectively reduce the label noise, they are prone to miss the hard examples. In this paper, we propose a new self-training framework for domain adaptation, namely Domain adversarial learning enhanced Self-Training Framework (DaMSTF). Firstly, DaMSTF involves meta-learning to estimate the importance of each pseudo instance, so as to simultaneously reduce the label noise and preserve hard examples. Secondly, we design a meta constructor for constructing the meta-validation set, which guarantees the effectiveness of the meta-learning module by improving the quality of the meta-validation set. Thirdly, we find that the meta-learning module suffers from the training guidance vanishment and tends to converge to an inferior optimal. To this end, we employ domain adversarial learning as a heuristic neural network initialization method, which can help the meta-learning module converge to a better optimal. Theoretically and experimentally, we demonstrate the effectiveness of the proposed DaMSTF. On the cross-domain sentiment classification task, DaMSTF improves the performance of BERT with an average of nearly 4%.


Graphologue: Exploring Large Language Model Responses with Interactive Diagrams

arXiv.org Artificial Intelligence

Large language models (LLMs) have recently soared in popularity due to their ease of access and the unprecedented ability to synthesize text responses to diverse user questions. However, LLMs like ChatGPT present significant limitations in supporting complex information tasks due to the insufficient affordances of the text-based medium and linear conversational structure. Through a formative study with ten participants, we found that LLM interfaces often present long-winded responses, making it difficult for people to quickly comprehend and interact flexibly with various pieces of information, particularly during more complex tasks. We present Graphologue, an interactive system that converts text-based responses from LLMs into graphical diagrams to facilitate information-seeking and question-answering tasks. Graphologue employs novel prompting strategies and interface designs to extract entities and relationships from LLM responses and constructs node-link diagrams in real-time. Further, users can interact with the diagrams to flexibly adjust the graphical presentation and to submit context-specific prompts to obtain more information. Utilizing diagrams, Graphologue enables graphical, non-linear dialogues between humans and LLMs, facilitating information exploration, organization, and comprehension.


'So important': UK minister endorses Google's training drive in AI arms race

The Guardian

A larger-than-life Michelle Donelan beams on to a screen in Google's London headquarters. The UK science and innovation secretary is appearing via video to praise the US tech behemoth for its plans to equip workers and bosses with basic skills in artificial intelligence (AI). "The recent explosion in the use of AI tools like ChatGPT and Google's Bard show that we are on the cusp of a new and exciting era in artificial intelligence, and it is one that will dramatically improve people's lives," says Donelan. Google's "ambitious" training programme is "so important" and "exceptional in its breadth", she gushes in a five-minute video, filmed in her ministerial office. Welcome to the AI arms race, where nations are bending over backwards to attract cash and research into the nascent technology.