Goto

Collaborating Authors

 Instructional Material


LogicLLM: Exploring Self-supervised Logic-enhanced Training for Large Language Models

arXiv.org Artificial Intelligence

Existing efforts to improve logical reasoning ability of language models have predominantly relied on supervised fine-tuning, hindering generalization to new domains and/or tasks. The development of Large Langauge Models (LLMs) has demonstrated the capacity of compressing abundant knowledge into a single proxy, enabling them to tackle multiple tasks effectively. Our preliminary experiments, nevertheless, show that LLMs do not show capability on logical reasoning. The performance of LLMs on logical reasoning benchmarks is far behind the existing state-of-the-art baselines. In this paper, we make the first attempt to investigate the feasibility of incorporating logical knowledge through self-supervised post-training, and activating it via in-context learning, which we termed as LogicLLM. Specifically, we devise an auto-regressive objective variant of MERIt and integrate it with two LLM series, i.e., FLAN-T5 and LLaMA, with parameter size ranging from 3 billion to 13 billion. The results on two challenging logical reasoning benchmarks demonstrate the effectiveness of LogicLLM. Besides, we conduct extensive ablation studies to analyze the key factors in designing logic-oriented proxy tasks.


Deep Stochastic Processes via Functional Markov Transition Operators

arXiv.org Artificial Intelligence

We introduce Markov Neural Processes (MNPs), a new class of Stochastic Processes (SPs) which are constructed by stacking sequences of neural parameterised Markov transition operators in function space. We prove that these Markov transition operators can preserve the exchangeability and consistency of SPs. Therefore, the proposed iterative construction adds substantial flexibility and expressivity to the original framework of Neural Processes (NPs) without compromising consistency or adding restrictions. Our experiments demonstrate clear advantages of MNPs over baseline models on a variety of tasks.


PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning

arXiv.org Artificial Intelligence

Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining and online finetuning, promises enhanced sample efficiency and policy performance. However, existing methods, effective as they are, suffer from suboptimal performance, limited adaptability, and unsatisfactory computational efficiency. We propose a novel framework, PROTO, which overcomes the aforementioned limitations by augmenting the standard RL objective with an iteratively evolving regularization term. Performing a trust-region-style update, PROTO yields stable initial finetuning and optimal final performance by gradually evolving the regularization term to relax the constraint strength. By adjusting only a few lines of code, PROTO can bridge any offline policy pretraining and standard off-policy RL finetuning to form a powerful offline-to-online RL pathway, birthing great adaptability to diverse methods. Simple yet elegant, PROTO imposes minimal additional computation and enables highly efficient online finetuning. Extensive experiments demonstrate that PROTO achieves superior performance over SOTA baselines, offering an adaptable and efficient offline-to-online RL framework.


Dealing with Cross-Task Class Discrimination in Online Continual Learning

arXiv.org Artificial Intelligence

Existing continual learning (CL) research regards catastrophic forgetting (CF) as almost the only challenge. This paper argues for another challenge in class-incremental learning (CIL), which we call cross-task class discrimination (CTCD),~i.e., how to establish decision boundaries between the classes of the new task and old tasks with no (or limited) access to the old task data. CTCD is implicitly and partially dealt with by replay-based methods. A replay method saves a small amount of data (replay data) from previous tasks. When a batch of current task data arrives, the system jointly trains the new data and some sampled replay data. The replay data enables the system to partially learn the decision boundaries between the new classes and the old classes as the amount of the saved data is small. However, this paper argues that the replay approach also has a dynamic training bias issue which reduces the effectiveness of the replay data in solving the CTCD problem. A novel optimization objective with a gradient-based adaptive method is proposed to dynamically deal with the problem in the online CL process. Experimental results show that the new method achieves much better results in online CL.


Support Vector Machine Guided Reproducing Kernel Particle Method for Image-Based Modeling of Microstructures

arXiv.org Artificial Intelligence

This work presents an approach for automating the discretization and approximation procedures in constructing digital representations of composites from Micro-CT images featuring intricate microstructures. The proposed method is guided by the Support Vector Machine (SVM) classification, offering an effective approach for discretizing microstructural images. An SVM soft margin training process is introduced as a classification of heterogeneous material points, and image segmentation is accomplished by identifying support vectors through a local regularized optimization problem. In addition, an Interface-Modified Reproducing Kernel Particle Method (IM-RKPM) is proposed for appropriate approximations of weak discontinuities across material interfaces. The proposed method modifies the smooth kernel functions with a regularized heavy-side function concerning the material interfaces to alleviate Gibb's oscillations. This IM-RKPM is formulated without introducing duplicated degrees of freedom associated with the interface nodes commonly needed in the conventional treatments of weak discontinuities in the meshfree methods. Moreover, IM-RKPM can be implemented with various domain integration techniques, such as Stabilized Conforming Nodal Integration (SCNI). The extension of the proposed method to 3-dimension is straightforward, and the effectiveness of the proposed method is validated through the image-based modeling of polymer-ceramic composite microstructures.


Robust Representation Learning with Reliable Pseudo-labels Generation via Self-Adaptive Optimal Transport for Short Text Clustering

arXiv.org Artificial Intelligence

Short text clustering is challenging since it takes imbalanced and noisy data as inputs. Existing approaches cannot solve this problem well, since (1) they are prone to obtain degenerate solutions especially on heavy imbalanced datasets, and (2) they are vulnerable to noises. To tackle the above issues, we propose a Robust Short Text Clustering (RSTC) model to improve robustness against imbalanced and noisy data. RSTC includes two modules, i.e., pseudo-label generation module and robust representation learning module. The former generates pseudo-labels to provide supervision for the later, which contributes to more robust representations and correctly separated clusters. To provide robustness against the imbalance in data, we propose self-adaptive optimal transport in the pseudo-label generation module. To improve robustness against the noise in data, we further introduce both class-wise and instance-wise contrastive learning in the robust representation learning module. Our empirical studies on eight short text clustering datasets demonstrate that RSTC significantly outperforms the state-of-the-art models. The code is available at: https://github.com/hmllmh/RSTC.


Barkour: Benchmarking Animal-level Agility with Quadruped Robots

arXiv.org Artificial Intelligence

Abstract--Animals have evolved various agile locomotion strategies, such as sprinting, leaping, and jumping. There is a growing interest in developing legged robots that move like their biological counterparts and show various agile skills to navigate complex environments quickly. Despite the interest, the field lacks systematic benchmarks to measure the performance of control policies and hardware in agility. We introduce the Barkour benchmark, an obstacle course to quantify agility for legged robots. Inspired by dog agility competitions, it consists of diverse obstacles and a time based scoring mechanism. This encourages researchers to develop controllers that not only move fast, but do so in a controllable and versatile way. To set strong baselines, we present two methods for tackling the benchmark. In the first approach, we train specialist locomotion skills using on-policy reinforcement learning methods and combine them with a highlevel navigation controller. In the second approach, we distill the specialist skills into a Transformer-based generalist locomotion policy, named Locomotion-Transformer, that can handle various terrains and adjust the robot's gait based on the perceived There has been a proliferation of legged robot development inspired by animal mobility. An important research question in this field is how to develop a controller that enables legged robots to exhibit animal-level agility while also being able to generalize environments, such as up and down stairs, through bushes, across various obstacles and terrains. Through the exploration and over unpaved roads and rocky or even sandy beaches. of both learning and traditional control-based methods, there Despite advances in robot hardware and control, a major has been significant progress in enabling robots to walk across challenge in the field is the lack of standardized and intuitive a wide range of terrains [10, 21, 20, 1, 27]. These robots are methods for evaluating the effectiveness of locomotion now capable of walking in a variety of indoor and outdoor controllers.


UGIF: UI Grounded Instruction Following

arXiv.org Artificial Intelligence

Smartphone users often find it difficult to navigate myriad menus to perform common tasks such as "How to block calls from unknown numbers?". Currently, help documents with step-by-step instructions are manually written to aid the user. The user experience can be further enhanced by grounding the instructions in the help document to the UI and overlaying a tutorial on the phone UI. To build such tutorials, several natural language processing components including retrieval, parsing, and grounding are necessary, but there isn't any relevant dataset for such a task. Thus, we introduce UGIF-DataSet, a multi-lingual, multi-modal UI grounded dataset for step-by-step task completion on the smartphone containing 4,184 tasks across 8 languages. As an initial approach to this problem, we propose retrieving the relevant instruction steps based on the user's query and parsing the steps using Large Language Models (LLMs) to generate macros that can be executed on-device. The instruction steps are often available only in English, so the challenge includes cross-modal, cross-lingual retrieval of English how-to pages from user queries in many languages and mapping English instruction steps to UI in a potentially different language. We compare the performance of different LLMs including PaLM and GPT-3 and find that the end-to-end task completion rate is 48% for English UI but the performance drops to 32% for other languages. We analyze the common failure modes of existing models on this task and point out areas for improvement.


Addressing Data Distribution Shifts in Online Machine Learning Powered Smart City Applications Using Augmented Test-Time Adaptation

arXiv.org Artificial Intelligence

Data distribution shift is a common problem in machine learning-powered smart city applications where the test data differs from the training data. Augmenting smart city applications with online machine learning models can handle this issue at test time, albeit with high cost and unreliable performance. To overcome this limitation, we propose to endow test-time adaptation with a systematic active fine-tuning (SAF) layer that is characterized by three key aspects: a continuity aspect that adapts to ever-present data distribution shifts; intelligence aspect that recognizes the importance of fine-tuning as a distribution-shift-aware process that occurs at the appropriate time to address the recently detected data distribution shifts; and cost-effectiveness aspect that involves budgeted human-machine collaboration to make relabeling cost-effective and practical for diverse smart city applications. Our empirical results show that our proposed approach outperforms the traditional test-time adaptation by a factor of two.


Generative AI: Implications and Applications for Education

arXiv.org Artificial Intelligence

The launch of ChatGPT in November 2022 precipitated a panic among some educators while prompting qualified enthusiasm from others. Under the umbrella term Generative AI, ChatGPT is an example of a range of technologies for the delivery of computer-generated text, image, and other digitized media. This paper examines the implications for education of one generative AI technology, chatbots responding from large language models, or C-LLM. It reports on an application of a C-LLM to AI review and assessment of complex student work. In a concluding discussion, the paper explores the intrinsic limits of generative AI, bound as it is to language corpora and their textual representation through binary notation. Within these limits, we suggest the range of emerging and potential applications of Generative AI in education.