Goto

Collaborating Authors

 Edmonton


Joint Level Generation and Translation Using Gameplay Videos

arXiv.org Artificial Intelligence

Procedural Content Generation via Machine Learning (PCGML) faces a significant hurdle that sets it apart from other fields, such as image or text generation, which is limited annotated data. Many existing methods for procedural level generation via machine learning require a secondary representation besides level images. However, the current methods for obtaining such representations are laborious and time-consuming, which contributes to this problem. In this work, we aim to address this problem by utilizing gameplay videos of two human-annotated games to develop a novel multi-tail framework that learns to perform simultaneous level translation and generation. The translation tail of our framework can convert gameplay video frames to an equivalent secondary representation, while its generation tail can produce novel level segments. Evaluation results and comparisons between our framework and baselines suggest that combining the level generation and translation tasks can lead to an overall improved performance regarding both tasks. This represents a possible solution to limited annotated level data, and we demonstrate the potential for future versions to generalize to unseen games.


Value-aware Importance Weighting for Off-policy Reinforcement Learning

arXiv.org Artificial Intelligence

Importance sampling is a central idea underlying off-policy prediction in reinforcement learning. It provides a strategy for re-weighting samples from a distribution to obtain unbiased estimates under another distribution. However, importance sampling weights tend to exhibit extreme variance, often leading to stability issues in practice. In this work, we consider a broader class of importance weights to correct samples in off-policy learning. We propose the use of $\textit{value-aware importance weights}$ which take into account the sample space to provide lower variance, but still unbiased, estimates under a target distribution. We derive how such weights can be computed, and detail key properties of the resulting importance weights. We then extend several reinforcement learning prediction algorithms to the off-policy setting with these weights, and evaluate them empirically.


Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers

arXiv.org Artificial Intelligence

Real-time learning is crucial for robotic agents adapting to ever-changing, non-stationary environments. A common setup for a robotic agent is to have two different computers simultaneously: a resource-limited local computer tethered to the robot and a powerful remote computer connected wirelessly. Given such a setup, it is unclear to what extent the performance of a learning system can be affected by resource limitations and how to efficiently use the wirelessly connected powerful computer to compensate for any performance loss. In this paper, we implement a real-time learning system called the Remote-Local Distributed (ReLoD) system to distribute computations of two deep reinforcement learning (RL) algorithms, Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO), between a local and a remote computer. The performance of the system is evaluated on two vision-based control tasks developed using a robotic arm and a mobile robot. Our results show that SAC's performance degrades heavily on a resource-limited local computer. Strikingly, when all computations of the learning system are deployed on a remote workstation, SAC fails to compensate for the performance loss, indicating that, without careful consideration, using a powerful remote computer may not result in performance improvement. However, a carefully chosen distribution of computations of SAC consistently and substantially improves its performance on both tasks. On the other hand, the performance of PPO remains largely unaffected by the distribution of computations. In addition, when all computations happen solely on a powerful tethered computer, the performance of our system remains on par with an existing system that is well-tuned for using a single machine. ReLoD is the only publicly available system for real-time RL that applies to multiple robots for vision-based tasks.


Sim-to-Real Surgical Robot Learning and Autonomous Planning for Internal Tissue Points Manipulation using Reinforcement Learning

arXiv.org Artificial Intelligence

Indirect simultaneous positioning (ISP), where internal tissue points are placed at desired locations indirectly through the manipulation of boundary points, is a type of subtask frequently performed in robotic surgeries. Although challenging due to complex tissue dynamics, automating the task can potentially reduce the workload of surgeons. This paper presents a sim-to-real framework for learning to automate the task without interacting with a real environment, and for planning preoperatively to find the grasping points that minimize local tissue deformation. A control policy is learned using deep reinforcement learning (DRL) in the FEM-based simulation environment and transferred to real-world situation. Grasping points are planned in the simulator by utilizing the trained policy using Bayesian optimization (BO). Inconsistent simulation performance is overcome by formulating the problem as a state augmented Markov decision process (MDP). Experimental results show that the learned policy places the internal tissue points accurately, and that the planned grasping points yield small tissue deformation among the trials. The proposed learning and planning scheme is able to automate internal tissue point manipulation in surgeries and has the potential to be generalized to complex surgical scenarios.


UAlberta at SemEval-2023 Task 1: Context Augmentation and Translation for Multilingual Visual Word Sense Disambiguation

arXiv.org Artificial Intelligence

We describe the systems of the University of Alberta team for the SemEval-2023 Visual Word Sense Disambiguation (V-WSD) Task. We present a novel algorithm that leverages glosses retrieved from BabelNet, in combination with text and image encoders. Furthermore, we compare language-specific encoders against the application of English encoders to translated texts. As the contexts given in the task datasets are extremely short, we also experiment with augmenting these contexts with descriptions generated by a language model. This yields substantial improvements in accuracy. We describe and evaluate additional V-WSD methods which use image generation and text-conditioned image segmentation. Overall, the results of our official submission rank us 18 out of 56 teams. Some of our unofficial results are even better than the official ones. Our code is publicly available at https://github.com/UAlberta-NLP/v-wsd.


Machine Learning Approach for Cancer Entities Association and Classification

arXiv.org Artificial Intelligence

As numerous biomedical research articles are published regularly, adding knowledge to the accumulated literature on different diseases, such as cancer, neurodegenerative diseases, and hereditary diseases. One of the leading causes of global mortality disease is cancer due to various reasons such as lifestyle habits, radiation exposure, viral infections, and tobacco consumption [1] [2]. These reasons ultimately make some genetic change in a cell of tissue which causes it to become cancerous. Due to the top priority given to cancer research compared to other human diseases, enormous articles were published [3] [4] in a short period [5]. It can serve as a relevant source for cancer knowledge discovery in different fields of diagnostics, application of drugs, genetic association, prevention, and treatment. An automate downloading of articles and extraction of related entities will advance the progression of the research faster. Natural Language Processing (NLP) helps in communicating computers with humans in their language and converts the unstructured data into structured data to improve the accuracy of text mining. NLP function guides to understanding the human query language to discover knowledge from literature without much manual effort [6]. Named Entity Recognition (NER) and text classification is used mainly for text mining [7].


Correcting discount-factor mismatch in on-policy policy gradient methods

arXiv.org Artificial Intelligence

The policy gradient theorem gives a convenient form of the policy gradient in terms of three factors: an action value, a gradient of the action likelihood, and a state distribution involving discounting called the \emph{discounted stationary distribution}. But commonly used on-policy methods based on the policy gradient theorem ignores the discount factor in the state distribution, which is technically incorrect and may even cause degenerate learning behavior in some environments. An existing solution corrects this discrepancy by using $\gamma^t$ as a factor in the gradient estimate. However, this solution is not widely adopted and does not work well in tasks where the later states are similar to earlier states. We introduce a novel distribution correction to account for the discounted stationary distribution that can be plugged into many existing gradient estimators. Our correction circumvents the performance degradation associated with the $\gamma^t$ correction with a lower variance. Importantly, compared to the uncorrected estimators, our algorithm provides improved state emphasis to evade suboptimal policies in certain environments and consistently matches or exceeds the original performance on several OpenAI gym and DeepMind suite benchmarks.


Exploring the Training Robustness of Distributional Reinforcement Learning against Noisy State Observations

arXiv.org Artificial Intelligence

In real scenarios, state observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training. In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return. Firstly, we validate the contraction of distributional Bellman operators in the State-Noisy Markov Decision Process (SN-MDP), a typical tabular case that incorporates both random and adversarial state observation noises. In the noisy setting with function approximation, we then analyze the vulnerability of least squared loss in expectation-based RL with either linear or nonlinear function approximation. By contrast, we theoretically characterize the bounded gradient norm of distributional RL loss based on the categorical parameterization equipped with the KL divergence. The resulting stable gradients while the optimization in distributional RL accounts for its better training robustness against state observation noises. Finally, extensive experiments on the suite of environments verified that distributional RL is less vulnerable against both random and adversarial noisy state observations compared with its expectation-based counterpart.


Deep Learning of Dynamical System Parameters from Return Maps as Images

arXiv.org Artificial Intelligence

We present a novel approach to system identification (SI) using deep learning techniques. Focusing on parametric system identification (PSI), we use a supervised learning approach for estimating the parameters of discrete and continuous-time dynamical systems, irrespective of chaos. To accomplish this, we transform collections of state-space trajectory observations into image-like data to retain the state-space topology of trajectories from dynamical systems and train convolutional neural networks to estimate the parameters of dynamical systems from these images. We demonstrate that our approach can learn parameter estimation functions for various dynamical systems, and by using training-time data augmentation, we are able to learn estimation functions whose parameter estimates are robust to changes in the sample fidelity of their inputs. Once trained, these estimation models return parameter estimations for new systems with negligible time and computation costs.


Multilingual Multiword Expression Identification Using Lateral Inhibition and Domain Adaptation

arXiv.org Artificial Intelligence

Correctly identifying multiword expressions (MWEs) is an important task for most natural language processing systems since their misidentification can result in ambiguity and misunderstanding of the underlying text. In this work, we evaluate the performance of the mBERT model for MWE identification in a multilingual context by training it on all 14 languages available in version 1.2 of the PARSEME corpus. We also incorporate lateral inhibition and language adversarial training into our methodology to create language-independent embeddings and improve its capabilities in identifying multiword expressions. The evaluation of our models shows that the approach employed in this work achieves better results compared to the best system of the PARSEME 1.2 competition, MTLB-STRUCT, on 11 out of 14 languages for global MWE identification and on 12 out of 14 languages for unseen MWE identification. Additionally, averaged across all languages, our best approach outperforms the MTLB-STRUCT system by 1.23% on global MWE identification and by 4.73% on unseen global MWE identification.