Goto

Collaborating Authors

 Calgary


Navigating Extremes: Dynamic Sparsity in Large Output Space

arXiv.org Artificial Intelligence

In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to post-training pruning for generating efficient models. In principle, DST allows for a more memory efficient training process, as it maintains sparsity throughout the entire training run. However, current DST implementations fail to capitalize on this in practice. Because sparse matrix multiplication is much less efficient than dense matrix multiplication on GPUs, most implementations simulate sparsity by masking weights. In this paper, we leverage recent advances in semi-structured sparse training to apply DST in the domain of classification with large output spaces, where memory-efficiency is paramount. With a label space of possibly millions of candidates, the classification layer alone will consume several gigabytes of memory. Switching from a dense to a fixed fan-in sparse layer updated with sparse evolutionary training (SET); however, severely hampers training convergence, especially at the largest label spaces. We find that poor gradient flow from the sparse classifier to the dense text encoder make it difficult to learn good input representations. By employing an intermediate layer or adding an auxiliary training objective, we recover most of the generalisation performance of the dense model. Overall, we demonstrate the applicability and practical benefits of DST in a challenging domain -- characterized by a highly skewed label distribution that differs substantially from typical DST benchmark datasets -- which enables end-to-end training with millions of labels on commodity hardware.


An Approach for Auto Generation of Labeling Functions for Software Engineering Chatbots

arXiv.org Artificial Intelligence

Software engineering (SE) chatbots are increasingly gaining attention for their role in enhancing development processes. At the core of chatbots are the Natural Language Understanding platforms (NLUs), which enable them to comprehend and respond to user queries. Before deploying NLUs, there is a need to train them with labeled data. However, acquiring such labeled data for SE chatbots is challenging due to the scarcity of high-quality datasets. This challenge arises because training SE chatbots requires specialized vocabulary and phrases not found in typical language datasets. Consequently, chatbot developers often resort to manually annotating user queries to gather the data necessary for training effective chatbots, a process that is both time-consuming and resource-intensive. Previous studies propose approaches to support chatbot practitioners in annotating users' posed queries. However, these approaches require human intervention to generate rules, called labeling functions (LFs), that identify and categorize user queries based on specific patterns in the data. To address this issue, we propose an approach to automatically generate LFs by extracting patterns from labeled user queries. We evaluate the effectiveness of our approach by applying it to the queries of four diverse SE datasets (namely AskGit, MSA, Ask Ubuntu, and Stack Overflow) and measure the performance improvement gained from training the NLU on the queries labeled by the generated LFs. We find that the generated LFs effectively label data with AUC scores of up to 85.3%, and NLU's performance improvement of up to 27.2% across the studied datasets. Furthermore, our results show that the number of LFs used to generate LFs affects the labeling performance. We believe that our approach can save time and resources in labeling users' queries, allowing practitioners to focus on core chatbot functionalities.


The path towards contact-based physical human-robot interaction

arXiv.org Artificial Intelligence

With the advancements in human-robot interaction (HRI), robots are now capable of operating in close proximity and engaging in physical interactions with humans (pHRI). Likewise, contact-based pHRI is becoming increasingly common as robots are equipped with a range of sensors to perceive human motions. Despite the presence of surveys exploring various aspects of HRI and pHRI, there is presently a gap in comprehensive studies that collect, organize and relate developments across all aspects of contact-based pHRI. It has become challenging to gain a comprehensive understanding of the current state of the field, thoroughly analyze the aspects that have been covered, and identify areas needing further attention. Hence, the present survey. While it includes key developments in pHRI, a particular focus is placed on contact-based interaction, which has numerous applications in industrial, rehabilitation and medical robotics. Across the literature, a common denominator is the importance to establish a safe, compliant and human intention-oriented interaction. This endeavour encompasses aspects of perception, planning and control, and how they work together to enhance safety and reliability. Notably, the survey highlights the application of data-driven techniques: backed by a growing body of literature demonstrating their effectiveness, approaches like reinforcement learning and learning from demonstration have become key to improving robot perception and decision-making within complex and uncertain pHRI scenarios. As the field is yet in its early stage, these observations may help guide future developments and steer research towards the responsible integration of physically interactive robots into workplaces, public spaces, and elements of private life.


Edge-DIRECT: A Deep Reinforcement Learning-based Method for Solving Heterogeneous Electric Vehicle Routing Problem with Time Window Constraints

arXiv.org Artificial Intelligence

This trend is particularly evident in the logistics sector, where companies are actively integrating EVs into their transportation fleets. At the heart of this transition is the electric vehicle routing problem (EVRP), an optimization problem central to the operations of these logistics companies, focusing on dealing with the complexities of deploying EVs instead of internal combustion engine vehicles. This article addresses a practical routing problem for EVs, named heterogeneous electric vehicle routing problem with time-window constraints (HEVRPTW). It considers both vehicle attributes, such as varying cargo and battery capacities [4] and customer preferences regarding delivery times [5]. These factors create a more realistic and applicable model for contemporary logistics challenges. HEVRPTW, recognized as an NP-hard optimization problem, seeks to determine a set of routes with minimal cost, total traveling time, or total traveling distance, for a fleet of Heterogeneous EVs to serve each geographically dispersed customer's demands within a specified time-window. Traditional methods, including exact and heuristics solvers, are conventionally employed to solve various vehicle routing problem (VRP) variants. Due to the NP-Hard nature of HEVRPTW, and VRPs in general, exact methods, such as branch-and-price [6] and branchand-price-and-cut [7], consume prohibitively long time for solving practical-size problems [8].


Navigating High-Degree Heterogeneity: Federated Learning in Aerial and Space Networks

arXiv.org Artificial Intelligence

Federated learning offers a compelling solution to the challenges of networking and data privacy within aerial and space networks by utilizing vast private edge data and computing capabilities accessible through drones, balloons, and satellites. While current research has focused on optimizing the learning process, computing efficiency, and minimizing communication overhead, the issue of heterogeneity and class imbalance remains a significant barrier to rapid model convergence. In our study, we explore the influence of heterogeneity on class imbalance, which diminishes performance in ASN-based federated learning. We illustrate the correlation between heterogeneity and class imbalance within grouped data and show how constraints such as battery life exacerbate the class imbalance challenge. Our findings indicate that ASN-based FL faces heightened class imbalance issues even with similar levels of heterogeneity compared to other scenarios. Finally, we analyze the impact of varying degrees of heterogeneity on FL training and evaluate the efficacy of current state-of-the-art algorithms under these conditions. Our results reveal that the heterogeneity challenge is more pronounced in ASN-based federated learning and that prevailing algorithms often fail to effectively address high levels of heterogeneity.


Let the Poem Hit the Rhythm: Using a Byte-Based Transformer for Beat-Aligned Poetry Generation

arXiv.org Artificial Intelligence

The intersection between poetry and music provides an interesting case for computational creativity, yet remains relatively unexplored. This paper explores the integration of poetry and music through the lens of beat patterns, investigating whether a byte-based language model can generate words that fit specific beat patterns within the context of poetry. Drawing on earlier studies, we developed a method to train a byte-based transformer model, ByT5, to align poems with beat patterns. The results demonstrate a high level of beat alignment while maintaining semantic coherence. Future work will aim to improve the model's ability to create complete beat-aligned poems.


Exploring the Efficacy of Large Language Models (GPT-4) in Binary Reverse Engineering

arXiv.org Artificial Intelligence

This study investigates the capabilities of Large Language Models (LLMs), specifically GPT-4, in the context of Binary Reverse Engineering (RE). Employing a structured experimental approach, we analyzed the LLM's performance in interpreting and explaining human-written and decompiled codes. The research encompassed two phases: the first on basic code interpretation and the second on more complex malware analysis. Key findings indicate LLMs' proficiency in general code understanding, with varying effectiveness in detailed technical and security analyses. The study underscores the potential and current limitations of LLMs in reverse engineering, revealing crucial insights for future applications and improvements. Also, we examined our experimental methodologies, such as methods of evaluation and data constraints, which provided us with a technical vision for any future research activity in this field.


Augmented Physics: A Machine Learning-Powered Tool for Creating Interactive Physics Simulations from Static Diagrams

arXiv.org Artificial Intelligence

We introduce Augmented Physics, a machine learning-powered tool designed for creating interactive physics simulations from static textbook diagrams. Leveraging computer vision techniques, such as Segment Anything and OpenCV, our web-based system enables users to semi-automatically extract diagrams from physics textbooks and then generate interactive simulations based on the extracted content. These interactive diagrams are seamlessly integrated into scanned textbook pages, facilitating interactive and personalized learning experiences across various physics concepts, including gravity, optics, circuits, and kinematics. Drawing on an elicitation study with seven physics instructors, we explore four key augmentation techniques: 1) augmented experiments, 2) animated diagrams, 3) bi-directional manipulatives, and 4) parameter visualization. We evaluate our system through technical evaluation, a usability study (N=12), and expert interviews (N=12). The study findings suggest that our system can facilitate more engaging and personalized learning experiences in physics education.


Augmented Conversation with Embedded Speech-Driven On-the-Fly Referencing in AR

arXiv.org Artificial Intelligence

This paper introduces the concept of augmented conversation, which aims to support co-located in-person conversations via embedded speech-driven on-the-fly referencing in augmented reality (AR). Today computing technologies like smartphones allow quick access to a variety of references during the conversation. However, these tools often create distractions, reducing eye contact and forcing users to focus their attention on phone screens and manually enter keywords to access relevant information. In contrast, AR-based on-the-fly referencing provides relevant visual references in real-time, based on keywords extracted automatically from the spoken conversation. By embedding these visual references in AR around the conversation partner, augmented conversation reduces distraction and friction, allowing users to maintain eye contact and supporting more natural social interactions. To demonstrate this concept, we developed \system, a Hololens-based interface that leverages real-time speech recognition, natural language processing and gaze-based interactions for on-the-fly embedded visual referencing. In this paper, we explore the design space of visual referencing for conversations, and describe our our implementation -- building on seven design guidelines identified through a user-centered design process. An initial user study confirms that our system decreases distraction and friction in conversations compared to smartphone searches, while providing highly useful and relevant information.


RealitySummary: On-Demand Mixed Reality Document Enhancement using Large Language Models

arXiv.org Artificial Intelligence

We introduce RealitySummary, a mixed reality reading assistant that can enhance any printed or digital document using on-demand text extraction, summarization, and augmentation. While augmented reading tools promise to enhance physical reading experiences with overlaid digital content, prior systems have typically required pre-processed documents, which limits their generalizability and real-world use cases. In this paper, we explore on-demand document augmentation by leveraging large language models. To understand generalizable techniques for diverse documents, we first conducted an exploratory design study which identified five categories of document enhancements (summarization, augmentation, navigation, comparison, and extraction). Based on this, we developed a proof-of-concept system that can automatically extract and summarize text using Google Cloud OCR and GPT-4, then embed information around documents using a Microsoft Hololens 2 and Apple Vision Pro. We demonstrate real-time examples of six specific document augmentations: 1) summaries, 2) comparison tables, 3) timelines, 4) keyword lists, 5) summary highlighting, and 6) information cards. Results from a usability study (N=12) and in-the-wild study (N=11) highlight the potential benefits of on-demand MR document enhancement and opportunities for future research.