Goto

Collaborating Authors

 Kelowna


Model Compression Methods for YOLOv5: A Review

arXiv.org Artificial Intelligence

Over the past few years, extensive research has been devoted to enhancing YOLO object detectors. Since its introduction, eight major versions of YOLO have been introduced with the purpose of improving its accuracy and efficiency. While the evident merits of YOLO have yielded to its extensive use in many areas, deploying it on resource-limited devices poses challenges. To address this issue, various neural network compression methods have been developed, which fall under three main categories, namely network pruning, quantization, and knowledge distillation. The fruitful outcomes of utilizing model compression methods, such as lowering memory usage and inference time, make them favorable, if not necessary, for deploying large neural networks on hardware-constrained edge devices. In this review paper, our focus is on pruning and quantization due to their comparative modularity. We categorize them and analyze the practical results of applying those methods to YOLOv5. By doing so, we identify gaps in adapting pruning and quantization for compressing YOLOv5, and provide future directions in this area for further exploration. Among several versions of YOLO, we specifically choose YOLOv5 for its excellent trade-off between recency and popularity in literature. This is the first specific review paper that surveys pruning and quantization methods from an implementation point of view on YOLOv5. Our study is also extendable to newer versions of YOLO as implementing them on resource-limited devices poses the same challenges that persist even today. This paper targets those interested in the practical deployment of model compression methods on YOLOv5, and in exploring different compression techniques that can be used for subsequent versions of YOLO.


Systematic Adaptation of Communication-focused Machine Learning Models from Real to Virtual Environments for Human-Robot Collaboration

arXiv.org Artificial Intelligence

Virtual reality has proved to be useful in applications in several fields ranging from gaming, medicine, and training to development of interfaces that enable human-robot collaboration. It empowers designers to explore applications outside of the constraints posed by the real world environment and develop innovative solutions and experiences. Hand gestures recognition which has been a topic of much research and subsequent commercialization in the real world has been possible because of the creation of large, labelled datasets. In order to utilize the power of natural and intuitive hand gestures in the virtual domain for enabling embodied teleoperation of collaborative robots, similarly large datasets must be created so as to keep the working interface easy to learn and flexible enough to add more gestures. Depending on the application, this may be computationally or economically prohibitive. Thus, the adaptation of trained deep learning models that perform well in the real environment to the virtual may be a solution to this challenge. This paper presents a systematic framework for the real to virtual adaptation using limited size of virtual dataset along with guidelines for creating a curated dataset. Finally, while hand gestures have been considered as the communication mode, the guidelines and recommendations presented are generic. These are applicable to other modes such as body poses and facial expressions which have large datasets available in the real domain which must be adapted to the virtual one.


LearnedSort as a learning-augmented SampleSort: Analysis and Parallelization

arXiv.org Artificial Intelligence

This work analyzes and parallelizes LearnedSort, the novel algorithm that sorts using machine learning models based on the cumulative distribution function. LearnedSort is analyzed under the lens of algorithms with predictions, and it is argued that LearnedSort is a learning-augmented SampleSort. A parallel LearnedSort algorithm is developed combining LearnedSort with the state-of-the-art SampleSort implementation, IPS4o. Benchmarks on synthetic and real-world datasets demonstrate improved parallel performance for parallel LearnedSort compared to IPS4o and other sorting algorithms.


Coding for the Gaussian Channel in the Finite Blocklength Regime Using a CNN-Autoencoder

arXiv.org Artificial Intelligence

The development of delay-sensitive applications that require ultra high reliability created an additional challenge for wireless networks. This led to Ultra-Reliable Low-Latency Communications, as a use case that 5G and beyond 5G systems must support. However, supporting low latency communications requires the use of short codes, while attaining vanishing frame error probability (FEP) requires long codes. Thus, developing codes for the finite blocklength regime (FBR) achieving certain reliability requirements is necessary. This paper investigates the potential of Convolutional Neural Networks autoencoders (CNN-AE) in approaching the theoretical maximum achievable rate over a Gaussian channel for a range of signal-to-noise ratios at a fixed blocklength and target FEP, which is a different perspective compared to existing works that explore the use of CNNs from bit-error and symbol-error rate perspectives. We explain the studied CNN-AE architecture, evaluate it numerically, and compare it to the theoretical maximum achievable rate and the achievable rates of polar coded quadrature amplitude modulation (QAM), Reed-Muller coded QAM, multilevel polar coded modulation, and a TurboAE-MOD scheme from the literature. Numerical results show that the CNN-AE outperforms these benchmark schemes and approaches the theoretical maximum rate, demonstrating the capability of CNN-AEs in learning good codes for delay-constrained applications.


Exploiting Symmetry and Heuristic Demonstrations in Off-policy Reinforcement Learning for Robotic Manipulation

arXiv.org Artificial Intelligence

Reinforcement learning demonstrates significant potential in automatically building control policies in numerous domains, but shows low efficiency when applied to robot manipulation tasks due to the curse of dimensionality. To facilitate the learning of such tasks, prior knowledge or heuristics that incorporate inherent simplification can effectively improve the learning performance. This paper aims to define and incorporate the natural symmetry present in physical robotic environments. Then, sample-efficient policies are trained by exploiting the expert demonstrations in symmetrical environments through an amalgamation of reinforcement and behavior cloning, which gives the off-policy learning process a diverse yet compact initiation. Furthermore, it presents a rigorous framework for a recent concept and explores its scope for robot manipulation tasks. The proposed method is validated via two point-to-point reaching tasks of an industrial arm, with and without an obstacle, in a simulation experiment study. A PID controller, which tracks the linear joint-space trajectories with hard-coded temporal logic to produce interim midpoints, is used to generate demonstrations in the study. The results of the study present the effect of the number of demonstrations and quantify the magnitude of behavior cloning to exemplify the possible improvement of model-free reinforcement learning in common manipulation tasks. A comparison study between the proposed method and a traditional off-policy reinforcement learning algorithm indicates its advantage in learning performance and potential value for applications.


AI Eye Podcast: Stocks discussed: (NYSE: NOTE) (NasdaqGS: NVDA)

#artificialintelligence

Newswire) Investorideas.com, a global investor news source covering Artificial Intelligence (AI) brings you today's edition of The AI Eye - watching stock news, deal tracker and advancements in artificial intelligence. FiscalNote Holdings, Inc. (NYSE:NOTE) has announced it "has been selected as one of 14 inaugural "trusted partners" - and the sole provider of legal, political, and regulatory data and information - to collaborate with AI research and deployment company OpenAI by enabling access to select FiscalNote market leading real-time data sets and content for users of OpenAI's ChatGPT platform." FiscalNote's Chairman, CEO, and Co-founder, Tim Hwang, said: "Since we founded FiscalNote a decade ago, the company has been an early adopter and pioneer of AI, uniquely applying it to the political and legal domain, and building a specialized expertise that has made us the unparalleled leader in this space. We're excited to collaborate with OpenAI and, as the market leader in legal and regulatory intelligence, we intend to continue to always be at the forefront as technological capabilities continue to advance. We believe this is the beginning of an innovative collaboration with a fellow AI pioneer, and we intend to continue to push the bounds of what is possible as we use this cutting-edge technology to deliver results for our global customers and advance their business objectives."


An Application of Deep Learning for Sweet Cherry Phenotyping using YOLO Object Detection

arXiv.org Artificial Intelligence

Tree fruit breeding is a long-term activity involving repeated measurements of various fruit quality traits on a large number of samples. These traits are traditionally measured by manually counting the fruits, weighing to indirectly measure the fruit size, and fruit colour is classified subjectively into different color categories using visual comparison to colour charts. These processes are slow, expensive and subject to evaluators' bias and fatigue. Recent advancements in deep learning can help automate this process. Objective data can be generated for consistent characterization of germplasm, with greater speed and higher accuracy. A method was developed to automatically count the number of sweet cherry fruits in a camera's field of view in real time using YOLOv3. A system capable of analyzing the image data for other traits such as size and color was also developed using Python. The YOLO model obtained close to 99% accuracy in object detection and counting of cherries and 90% on the Intersection over Union metric for object localization when extracting size and colour information. The model surpasses human performance and offers a significant improvement compared to manual counting.


Semantic Encoder Guided Generative Adversarial Face Ultra-Resolution Network

arXiv.org Artificial Intelligence

Face super-resolution is a domain-specific image super-resolution, which aims to generate High-Resolution (HR) face images from their Low-Resolution (LR) counterparts. In this paper, we propose a novel face super-resolution method, namely Semantic Encoder guided Generative Adversarial Face Ultra-Resolution Network (SEGA-FURN) to ultra-resolve an unaligned tiny LR face image to its HR counterpart with multiple ultra-upscaling factors (e.g., 4x and 8x). The proposed network is composed of a novel semantic encoder that has the ability to capture the embedded semantics to guide adversarial learning and a novel generator that uses a hierarchical architecture named Residual in Internal Dense Block (RIDB). Moreover, we propose a joint discriminator which discriminates both image data and embedded semantics. The joint discriminator learns the joint probability distribution of the image space and latent space. We also use a Relativistic average Least Squares loss (RaLS) as the adversarial loss to alleviate the gradient vanishing problem and enhance the stability of the training procedure. Extensive experiments on large face datasets have proved that the proposed method can achieve superior super-resolution results and significantly outperform other state-of-the-art methods in both qualitative and quantitative comparisons.


Real-World Image Super Resolution via Unsupervised Bi-directional Cycle Domain Transfer Learning based Generative Adversarial Network

arXiv.org Artificial Intelligence

Deep Convolutional Neural Networks (DCNNs) have exhibited impressive performance on image super-resolution tasks. However, these deep learning-based super-resolution methods perform poorly in real-world super-resolution tasks, where the paired high-resolution and low-resolution images are unavailable and the low-resolution images are degraded by complicated and unknown kernels. To break these limitations, we propose the Unsupervised Bi-directional Cycle Domain Transfer Learning-based Generative Adversarial Network (UBCDTL-GAN), which consists of an Unsupervised Bi-directional Cycle Domain Transfer Network (UBCDTN) and the Semantic Encoder guided Super Resolution Network (SESRN). First, the UBCDTN is able to produce an approximated real-like LR image through transferring the LR image from an artificially degraded domain to the real-world LR image domain. Second, the SESRN has the ability to super-resolve the approximated real-like LR image to a photo-realistic HR image. Extensive experiments on unpaired real-world image benchmark datasets demonstrate that the proposed method achieves superior performance compared to state-of-the-art methods.


Learning Branching Heuristics from Graph Neural Networks

arXiv.org Artificial Intelligence

Backtracking has been widely used for solving problems in artificial intelligence (AI), including constraint satisfaction problems and combinatorial optimization problems. Good branching heuristics can efficiently improve the performance of backtracking by helping prune the search space and leading the search to the most promising direction. In this paper, we first propose a new graph neural network (GNN) model designed using the probabilistic method. From the GNN model, we introduce an approach to learn a branching heuristic for combinatorial optimization problems. In particular, our GNN model learns appropriate probability distributions on vertices in given graphs from which the branching heuristic is extracted and used in a backtracking search. Our experimental results for the (minimum) dominating-clique problem show that this learned branching heuristic performs better than the minimum-remaining-values heuristic in terms of the number of branches of the whole search tree. Our approach introduces a new way of applying GNNs towards enhancing the classical backtracking algorithm used in AI.