Goto

Collaborating Authors

 Calgary


Explaining EDA synthesis errors with LLMs

arXiv.org Artificial Intelligence

Training new engineers in digital design is a challenge, particularly when it comes to teaching the complex electronic design automation (EDA) tooling used in this domain. Learners will typically deploy designs in the Verilog and VHDL hardware description languages to Field Programmable Gate Arrays (FPGAs) from Altera (Intel) and Xilinx (AMD) via proprietary closed-source toolchains (Quartus Prime and Vivado, respectively). These tools are complex and difficult to use -- yet, as they are the tools used in industry, they are an essential first step in this space. In this work, we examine how recent advances in artificial intelligence may be leveraged to address aspects of this challenge. Specifically, we investigate if Large Language Models (LLMs), which have demonstrated text comprehension and question-answering capabilities, can be used to generate novice-friendly explanations of compile-time synthesis error messages from Quartus Prime and Vivado. To perform this study we generate 936 error message explanations using three OpenAI LLMs over 21 different buggy code samples. These are then graded for relevance and correctness, and we find that in approximately 71% of cases the LLMs give correct & complete explanations suitable for novice learners.


The Case for Developing a Foundation Model for Planning-like Tasks from Scratch

arXiv.org Artificial Intelligence

Foundation Models (FMs) have revolutionized many areas of computing, including Automated Planning and Scheduling (APS). For example, a recent study found them useful for planning problems: plan generation, language translation, model construction, multi-agent planning, interactive planning, heuristics optimization, tool integration, and brain-inspired planning. Besides APS, there are many seemingly related tasks involving the generation of a series of actions with varying guarantees of their executability to achieve intended goals, which we collectively call planning-like (PL) tasks like business processes, programs, workflows, and guidelines, where researchers have considered using FMs. However, previous works have primarily focused on pre-trained, off-the-shelf FMs and optionally fine-tuned them. This paper discusses the need for a comprehensive FM for PL tasks from scratch and explores its design considerations. We argue that such an FM will open new and efficient avenues for PL problem-solving, just like LLMs are creating for APS.


A Ground Mobile Robot for Autonomous Terrestrial Laser Scanning-Based Field Phenotyping

arXiv.org Artificial Intelligence

Traditional field phenotyping methods are often manual, time-consuming, and destructive, posing a challenge for breeding progress. To address this bottleneck, robotics and automation technologies offer efficient sensing tools to monitor field evolution and crop development throughout the season. This study aimed to develop an autonomous ground robotic system for LiDAR-based field phenotyping in plant breeding trials. A Husky platform was equipped with a high-resolution three-dimensional (3D) laser scanner to collect in-field terrestrial laser scanning (TLS) data without human intervention. To automate the TLS process, a 3D ray casting analysis was implemented for optimal TLS site planning, and a route optimization algorithm was utilized to minimize travel distance during data collection. The platform was deployed in two cotton breeding fields for evaluation, where it autonomously collected TLS data. The system provided accurate pose information through RTK-GNSS positioning and sensor fusion techniques, with average errors of less than 0.6 cm for location and 0.38$^{\circ}$ for heading. The achieved localization accuracy allowed point cloud registration with mean point errors of approximately 2 cm, comparable to traditional TLS methods that rely on artificial targets and manual sensor deployment. This work presents an autonomous phenotyping platform that facilitates the quantitative assessment of plant traits under field conditions of both large agricultural fields and small breeding trials to contribute to the advancement of plant phenomics and breeding programs.


AI-generated text boundary detection with RoFT

arXiv.org Artificial Intelligence

Due to the rapid development of large language models, people increasingly often encounter texts that may start as written by a human but continue as machine-generated. Detecting the boundary between human-written and machine-generated parts of such texts is a challenging problem that has not received much attention in literature. We attempt to bridge this gap and examine several ways to adapt state of the art artificial text detection classifiers to the boundary detection setting. We push all detectors to their limits, using the Real or Fake text benchmark that contains short texts on several topics and includes generations of various language models. We use this diversity to deeply examine the robustness of all detectors in cross-domain and cross-model settings to provide baselines and insights for future research. In particular, we find that perplexity-based approaches to boundary detection tend to be more robust to peculiarities of domain-specific data than supervised fine-tuning of the RoBERTa model; we also find which features of the text confuse boundary detection algorithms and negatively influence their performance in cross-domain settings.


VortexViz: Finding Vortex Boundaries by Learning from Particle Trajectories

arXiv.org Artificial Intelligence

Abstract--Vortices are studied in various scientific disciplines, offering insights into fluid flow behavior. Visualizing the boundary of vortices is crucial for understanding flow phenomena and detecting flow irregularities. This paper addresses the challenge of accurately extracting vortex boundaries using deep learning techniques. While existing methods primarily train on velocity components, we propose a novel approach incorporating particle trajectories (streamlines or pathlines) into the learning process. By leveraging the regional/local characteristics of the flow field captured by streamlines or pathlines, our methodology aims to enhance the accuracy of vortex boundary extraction. In aerodynamics, researchers focus on studying vortices that form in the wake of an aircraft, aiming to mitigate the creation of vortices with long lifetimes; persistent vortices can potentially impede commercial aviation's operational capacity [1]-[3]. Oceanographers, on the other hand, study mesoscale eddies modeled as vortices, to understand the transportation of nutrients and heat in ocean currents [4]- [6].


Debiasing Cardiac Imaging with Controlled Latent Diffusion Models

arXiv.org Artificial Intelligence

The progress in deep learning solutions for disease diagnosis and prognosis based on cardiac magnetic resonance imaging is hindered by highly imbalanced and biased training data. To address this issue, we propose a method to alleviate imbalances inherent in datasets through the generation of synthetic data based on sensitive attributes such as sex, age, body mass index, and health condition. We adopt ControlNet based on a denoising diffusion probabilistic model to condition on text assembled from patient metadata and cardiac geometry derived from segmentation masks using a large-cohort study, specifically, the UK Biobank. We assess our method by evaluating the realism of the generated images using established quantitative metrics. Furthermore, we conduct a downstream classification task aimed at debiasing a classifier by rectifying imbalances within underrepresented groups through synthetically generated samples. Our experiments demonstrate the effectiveness of the proposed approach in mitigating dataset imbalances, such as the scarcity of younger patients or individuals with normal BMI level suffering from heart failure. This work represents a major step towards the adoption of synthetic data for the development of fair and generalizable models for medical classification tasks. Notably, we conduct all our experiments using a single, consumer-level GPU to highlight the feasibility of our approach within resource-constrained environments.


The Solution of the Zodiac Killer's 340-Character Cipher

arXiv.org Artificial Intelligence

The case of the Zodiac Killer is one of the most widely known unsolved serial killer cases in history. The unidentified killer murdered five known victims and terrorized the state of California. He also communicated extensively with the press and law enforcement. Besides his murders, Zodiac was known for his use of ciphers. The first Zodiac cipher was solved within a week of its publication, while the second cipher was solved by the authors after 51 years, when it was discovered to be a transposition and homophonic substitution cipher with unusual qualities. In this paper, we detail the historical significance of this cipher and the numerous efforts which culminated in its solution.


ChatGPT Incorrectness Detection in Software Reviews

arXiv.org Artificial Intelligence

We conducted a survey of 135 software engineering (SE) practitioners to understand how they use Generative AI-based chatbots like ChatGPT for SE tasks. We find that they want to use ChatGPT for SE tasks like software library selection but often worry about the truthfulness of ChatGPT responses. We developed a suite of techniques and a tool called CID (ChatGPT Incorrectness Detector) to automatically test and detect the incorrectness in ChatGPT responses. CID is based on the iterative prompting to ChatGPT by asking it contextually similar but textually divergent questions (using an approach that utilizes metamorphic relationships in texts). The underlying principle in CID is that for a given question, a response that is different from other responses (across multiple incarnations of the question) is likely an incorrect response. In a benchmark study of library selection, we show that CID can detect incorrect responses from ChatGPT with an F1-score of 0.74 - 0.75.


Multiple-Input Auto-Encoder Guided Feature Selection for IoT Intrusion Detection Systems

arXiv.org Artificial Intelligence

While intrusion detection systems (IDSs) benefit from the diversity and generalization of IoT data features, the data diversity (e.g., the heterogeneity and high dimensions of data) also makes it difficult to train effective machine learning models in IoT IDSs. This also leads to potentially redundant/noisy features that may decrease the accuracy of the detection engine in IDSs. This paper first introduces a novel neural network architecture called Multiple-Input Auto-Encoder (MIAE). MIAE consists of multiple sub-encoders that can process inputs from different sources with different characteristics. The MIAE model is trained in an unsupervised learning mode to transform the heterogeneous inputs into lower-dimensional representation, which helps classifiers distinguish between normal behaviour and different types of attacks. To distil and retain more relevant features but remove less important/redundant ones during the training process, we further design and embed a feature selection layer right after the representation layer of MIAE resulting in a new model called MIAEFS. This layer learns the importance of features in the representation vector, facilitating the selection of informative features from the representation vector. The results on three IDS datasets, i.e., NSLKDD, UNSW-NB15, and IDS2017, show the superior performance of MIAE and MIAEFS compared to other methods, e.g., conventional classifiers, dimensionality reduction models, unsupervised representation learning methods with different input dimensions, and unsupervised feature selection models. Moreover, MIAE and MIAEFS combined with the Random Forest (RF) classifier achieve accuracy of 96.5% in detecting sophisticated attacks, e.g., Slowloris. The average running time for detecting an attack sample using RF with the representation of MIAE and MIAEFS is approximate 1.7E-6 seconds, whilst the model size is lower than 1 MB.


Multiple and Gyro-Free Inertial Datasets

arXiv.org Artificial Intelligence

An inertial navigation system (INS) utilizes three orthogonal accelerometers and gyroscopes to determine platform position, velocity, and orientation. There are countless applications for INS, including robotics, autonomous platforms, and the internet of things. Recent research explores the integration of data-driven methods with INS, highlighting significant innovations, improving accuracy and efficiency. Despite the growing interest in this field and the availability of INS datasets, no datasets are available for gyro-free INS (GFINS) and multiple inertial measurement unit (MIMU) architectures. To fill this gap and to stimulate further research in this field, we designed and recorded GFINS and MIMU datasets using 54 inertial sensors grouped in nine inertial measurement units. These sensors can be used to define and evaluate different types of MIMU and GFINS architectures. The inertial sensors were arranged in three different sensor configurations and mounted on a mobile robot and a passenger car. In total, the dataset contains 35 hours of inertial data and corresponding ground truth trajectories. The data and code are freely accessible through our GitHub repository.