Goto

Collaborating Authors

 Har, Dongsoo


Frequency Hopping Synchronization by Reinforcement Learning for Satellite Communication System

arXiv.org Artificial Intelligence

Abstract: Satellite communication systems (SCSs) used for tactical purposes require robust security and anti-jamming capabilities, making frequency hopping (FH) a powerful option. However, the current FH systems face challenges due to significant interference from other devices and the considerable path loss inherent in satellite communication. This misalignment leads to inefficient synchronization, crucial for maintaining reliable communication. Traditional methods, such as those employing long short-term memory (LSTM) networks, have made improvements, but they still struggle in dynamic conditions of satellite environments. This paper presents a novel method for synchronizing FH signals in tactical SCSs by combining serial search and reinforcement learning to achieve coarse and fine acquisition, respectively. The mathematical analysis and simulation results demonstrate that the proposed method reduces the average number of hops required for synchronization by 58.17% and mean squared error (MSE) of the uplink hop timing estimation by 76.95%, as compared to the conventional serial search method. Comparing with the early late gate synchronization method based on serial search and use of LSTM network, the average number of hops for synchronization is reduced by 12.24% and the MSE by 18.5%. I. INTRODUCTION Satellite communication systems (SCSs) can transmit information over long distances without being limited by geographical boundaries. This technology has become essential in both military and civilian applications, such as command and control, meteorology, remote sensing, and video broadcasting. Generally, a SCS consists of a spacebased backbone network, a space-based access network, and a ground backbone network.


Continuous Adversarial Text Representation Learning for Affective Recognition

arXiv.org Artificial Intelligence

--While pre-trained language models excel at semantic understanding, they often struggle to capture nuanced affective information critical for affective recognition tasks. T o address these limitations, we propose a novel framework for enhancing emotion-aware embeddings in transformer-based models. Our approach introduces a continuous valence-arousal labeling system to guide contrastive learning, which captures subtle and multidimensional emotional nuances more effectively. Furthermore, we employ a dynamic token perturbation mechanism, using gradient-based saliency to focus on sentiment-relevant tokens, improving model sensitivity to emotional cues. The experimental results demonstrate that the proposed framework outperforms existing methods, achieving up to 15.5% improvement in the emotion classification benchmark, highlighting the importance of employing continuous labels. This improvement demonstrates that the proposed framework is effective in affective representation learning and enables precise and contextually relevant emotional understanding.


Cluster-based Sampling in Hindsight Experience Replay for Robotic Tasks (Student Abstract)

arXiv.org Artificial Intelligence

In multi-goal reinforcement learning with a sparse binary reward, training agents is particularly challenging, due to a lack of successful experiences. To solve this problem, hindsight experience replay (HER) generates successful experiences even from unsuccessful ones. However, generating successful experiences from uniformly sampled ones is not an efficient process. In this paper, the impact of exploiting the property of achieved goals in generating successful experiences is investigated and a novel cluster-based sampling strategy is proposed. The proposed sampling strategy groups episodes with different achieved goals by using a cluster model and samples experiences in the manner of HER to create the training batch. The proposed method is validated by experiments with three robotic control tasks of the OpenAI Gym. The results of experiments demonstrate that the proposed method is substantially sample efficient and achieves better performance than baseline approaches.


Virtual Action Actor-Critic Framework for Exploration (Student Abstract)

arXiv.org Artificial Intelligence

Efficient exploration for an agent is challenging in reinforcement learning (RL). In this paper, a novel actor-critic framework namely virtual action actor-critic (VAAC), is proposed to address the challenge of efficient exploration in RL. This work is inspired by humans' ability to imagine the potential outcomes of their actions without actually taking them. In order to emulate this ability, VAAC introduces a new actor called virtual actor (VA), alongside the conventional actor-critic framework. Unlike the conventional actor, the VA takes the virtual action to anticipate the next state without interacting with the environment. With the virtual policy following a Gaussian distribution, the VA is trained to maximize the anticipated novelty of the subsequent state resulting from a virtual action. If any next state resulting from available actions does not exhibit high anticipated novelty, training the VA leads to an increase in the virtual policy entropy. Hence, high virtual policy entropy represents that there is no room for exploration. The proposed VAAC aims to maximize a modified Q function, which combines cumulative rewards and the negative sum of virtual policy entropy. Experimental results show that the VAAC improves the exploration performance compared to existing algorithms.


Enhanced Transformer Architecture for Natural Language Processing

arXiv.org Artificial Intelligence

Transformer is a state-of-the-art model in the field of natural language processing (NLP). Current NLP models primarily increase the number of transformers to improve processing performance. However, this technique requires a lot of training resources such as computing capacity. In this paper, a novel structure of Transformer is proposed. It is featured by full layer normalization, weighted residual connection, positional encoding exploiting reinforcement learning, and zero masked self-attention. The proposed Transformer model, which is called Enhanced Transformer, is validated by the bilingual evaluation understudy (BLEU) score obtained with the Multi30k translation dataset. As a result, the Enhanced Transformer achieves 202.96% higher BLEU score as compared to the original transformer with the translation dataset.


Sensor Fusion by Spatial Encoding for Autonomous Driving

arXiv.org Artificial Intelligence

Sensor fusion is critical to perception systems for task domains such as autonomous driving and robotics. Recently, the Transformer integrated with CNN has demonstrated high performance in sensor fusion for various perception tasks. In this work, we introduce a method for fusing data from camera and LiDAR. By employing Transformer modules at multiple resolutions, proposed method effectively combines local and global contextual relationships. The performance of the proposed method is validated by extensive experiments with two adversarial benchmarks with lengthy routes and high-density traffics. The proposed method outperforms previous approaches with the most challenging benchmarks, achieving significantly higher driving and infraction scores. Compared with TransFuser, it achieves 8% and 19% improvement in driving scores for the Longest6 and Town05 Long benchmarks, respectively.


Reinforcement Learning-Based Cooperative P2P Power Trading between DC Nanogrid Clusters with Wind and PV Energy Resources

arXiv.org Artificial Intelligence

Abstract-- In replacing fossil fuels with renewable energy resources for carbon neutrality, the unbalanced resource production of intermittent wind and photovoltaic (PV) power is a critical issue for peer-to-peer (P2P) power trading. To address this issue, a reinforcement learning (RL) technique is introduced in this paper. For RL, a graph convolutional network (GCN) and a bi-directional long short-term memory (Bi-LSTM) network are jointly applied to P2P power trading between nanogrid clusters, based on cooperative game theory. The flexible and reliable DC nanogrid is suitable for integrating renewable energy for a distribution system. Each local nanogrid cluster takes the position of prosumer, focusing on power production and consumption simultaneously. For the power management of nanogrid cluster, multi-objective optimization is applied to each local nanogrid cluster with the Internet of Things (IoT) technology. Charging/discharging of an electric vehicle (EV) is executed considering the intermittent characteristics of wind and PV power production. RL algorithms, such as GCN-convolutional neural network (CNN) layers for deep Q-learning network (DQN), GCN-LSTM layers for deep recurrent Q-learning network (DRQN), GCN-Bi-LSTM layers for DRQN, and GCN-Bi-LSTM layers for proximal policy optimization (PPO), are used for simulations. Power management of nanogrid clusters with P2P power trading is simulated on a distribution test feeder in real time, and the proposed GCN-Bi-LSTM-PPO technique achieving the lowest electricity cost among the RL algorithms used for comparison reduces the electricity cost by 36.7%, averaging over nanogrid clusters. Keywords: Deep reinforcement learning, P2P power trading, Nanogrid, Power management, Renewable energy I.INTRODUCTION The widespread use of distributed energy resources (DERs) has significantly altered how energy is generated, transported, and used along the energy pipeline. A more decentralized and open electrical network is made possible with increased number of prosumers--individuals who produce and consume energy simultaneously. As a result of this context, new opportunities and difficulties for power systems have emerged. Peer-to-peer (P2P) power trading is a novel paradigm of distribution systems with a utility grid (UT) related to carbon neutrality and renewable energy generation [1]. P2P power trading has become a viable alternative for prosumers looking to actively participate in the energy market. Moreover, P2P trading gives end users more flexibility, increases possibilities to use clean energy, and aids in the transition to a low-carbon energy system. In addition to this, the other participants in the power market can also profit by lowering the peak electricity demand, lowering operating and maintenance expenses, and enhancing the dependability of the electrical system.


Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error

arXiv.org Artificial Intelligence

Training agents via off-policy deep reinforcement learning (RL) requires a large memory, named replay memory, that stores past experiences used for learning. These experiences are sampled, uniformly or non-uniformly, to create the batches used for training. When calculating the loss function, off-policy algorithms assume that all samples are of the same importance. In this paper, we hypothesize that training can be enhanced by assigning different importance for each experience based on their temporal-difference (TD) error directly in the training objective. We propose a novel method that introduces a weighting factor for each experience when calculating the loss function at the learning stage. In addition to improving convergence speed when used with uniform sampling, the method can be combined with prioritization methods for non-uniform sampling. Combining the proposed method with prioritization methods improves sampling efficiency while increasing the performance of TD-based off-policy RL algorithms. The effectiveness of the proposed method is demonstrated by experiments in six environments of the OpenAI Gym suite. The experimental results demonstrate that the proposed method achieves a 33%~76% reduction of convergence speed in three environments and an 11% increase in returns and a 3%~10% increase in success rate for other three environments.


Reinforcement Learning for Predicting Traffic Accidents

arXiv.org Artificial Intelligence

As the demand for autonomous driving increases, it is paramount to ensure safety. Early accident prediction using deep learning methods for driving safety has recently gained much attention. In this task, early accident prediction and a point prediction of where the drivers should look are determined, with the dashcam video as input. We propose to exploit the double actors and regularized critics (DARC) method, for the first time, on this accident forecasting platform. We derive inspiration from DARC since it is currently a state-of-the-art reinforcement learning (RL) model on continuous action space suitable for accident anticipation. Results show that by utilizing DARC, we can make predictions 5\% earlier on average while improving in multiple metrics of precision compared to existing methods. The results imply that using our RL-based problem formulation could significantly increase the safety of autonomous driving.


Kick-motion Training with DQN in AI Soccer Environment

arXiv.org Artificial Intelligence

This paper presents a technique to train a robot to perform kick-motion in AI soccer by using reinforcement learning (RL). In RL, an agent interacts with an environment and learns to choose an action in a state at each step. When training RL algorithms, a problem called the curse of dimensionality (COD) can occur if the dimension of the state is high and the number of training data is low. The COD often causes degraded performance of RL models. In the situation of the robot kicking the ball, as the ball approaches the robot, the robot chooses the action based on the information obtained from the soccer field. In order not to suffer COD, the training data, which are experiences in the case of RL, should be collected evenly from all areas of the soccer field over (theoretically infinite) time. In this paper, we attempt to use the relative coordinate system (RCS) as the state for training kick-motion of robot agent, instead of using the absolute coordinate system (ACS). Using the RCS eliminates the necessity for the agent to know all the (state) information of entire soccer field and reduces the dimension of the state that the agent needs to know to perform kick-motion, and consequently alleviates COD. The training based on the RCS is performed with the widely used Deep Q-network (DQN) and tested in the AI Soccer environment implemented with Webots simulation software.