Goto

Collaborating Authors

 Overview


Hierarchical Windowed Graph Attention Network and a Large Scale Dataset for Isolated Indian Sign Language Recognition

arXiv.org Artificial Intelligence

Automatic Sign Language (SL) recognition is an important task in the computer vision community. To build a robust SL recognition system, we need a considerable amount of data which is lacking particularly in Indian sign language (ISL). In this paper, we propose a large-scale isolated ISL dataset and a novel SL recognition model based on skeleton graph structure. The dataset covers 2,002 daily used common words in the deaf community recorded by 20 (10 male and 10 female) deaf adult signers (contains 40033 videos). We propose a SL recognition model namely Hierarchical Windowed Graph Attention Network (HWGAT) by utilizing the human upper body skeleton graph structure. The HWGAT tries to capture distinctive motions by giving attention to different body parts induced by the human skeleton graph structure. The utility of the proposed dataset and the usefulness of our model are evaluated through extensive experiments. We pre-trained the proposed model on the proposed dataset and fine-tuned it across different sign language datasets further boosting the performance of 1.10, 0.46, 0.78, and 6.84 percentage points on INCLUDE, LSA64, AUTSL and WLASL respectively compared to the existing state-of-the-art skeleton-based models.


Explainable Post hoc Portfolio Management Financial Policy of a Deep Reinforcement Learning agent

arXiv.org Artificial Intelligence

Financial portfolio management investment policies computed quantitatively by modern portfolio theory techniques like the Markowitz model rely on a set on assumptions that are not supported by data in high volatility markets. Hence, quantitative researchers are looking for alternative models to tackle this problem. Concretely, portfolio management is a problem that has been successfully addressed recently by Deep Reinforcement Learning (DRL) approaches. In particular, DRL algorithms train an agent by estimating the distribution of the expected reward of every action performed by an agent given any financial state in a simulator. However, these methods rely on Deep Neural Networks model to represent such a distribution, that although they are universal approximator models, they cannot explain its behaviour, given by a set of parameters that are not interpretable. Critically, financial investors policies require predictions to be interpretable, so DRL agents are not suited to follow a particular policy or explain their actions. In this work, we developed a novel Explainable Deep Reinforcement Learning (XDRL) approach for portfolio management, integrating the Proximal Policy Optimization (PPO) with the model agnostic explainable techniques of feature importance, SHAP and LIME to enhance transparency in prediction time. By executing our methodology, we can interpret in prediction time the actions of the agent to assess whether they follow the requisites of an investment policy or to assess the risk of following the agent suggestions. To the best of our knowledge, our proposed approach is the first explainable post hoc portfolio management financial policy of a DRL agent. We empirically illustrate our methodology by successfully identifying key features influencing investment decisions, which demonstrate the ability to explain the agent actions in prediction time.


On the use of Probabilistic Forecasting for Network Analysis in Open RAN

arXiv.org Artificial Intelligence

Unlike other single-point Artificial Intelligence (AI)-based prediction techniques, such as Long-Short Term Memory (LSTM), probabilistic forecasting techniques (e.g., DeepAR and Transformer) provide a range of possible outcomes and associated probabilities that enable decision makers to make more informed and robust decisions. At the same time, the architecture of Open RAN has emerged as a revolutionary approach for mobile networks, aiming at openness, interoperability and innovation in the ecosystem of RAN. In this paper, we propose the use of probabilistic forecasting techniques as a radio App (rApp) within the Open RAN architecture. We investigate and compare different probabilistic and single-point forecasting methods and algorithms to estimate the utilization and resource demands of Physical Resource Blocks (PRBs) of cellular base stations. Through our evaluations, we demonstrate the numerical advantages of probabilistic forecasting techniques over traditional single-point forecasting methods and show that they are capable of providing more accurate and reliable estimates. In particular, DeepAR clearly outperforms single-point forecasting techniques such as LSTM and Seasonal-Naive (SN) baselines and other probabilistic forecasting techniques such as Simple-Feed-Forward (SFF) and Transformer neural networks.


Operating System And Artificial Intelligence: A Systematic Review

arXiv.org Artificial Intelligence

In the dynamic landscape of technology, the convergence of Artificial Intelligence (AI) and Operating Systems (OS) has emerged as a pivotal arena for innovation. Our exploration focuses on the symbiotic relationship between AI and OS, emphasizing how AI-driven tools enhance OS performance, security, and efficiency, while OS advancements facilitate more sophisticated AI applications. We delve into various AI techniques employed to optimize OS functionalities, including memory management, process scheduling, and intrusion detection. Simultaneously, we analyze the role of OS in providing essential services and infrastructure that enable effective AI application execution, from resource allocation to data processing. The article also addresses challenges and future directions in this domain, emphasizing the imperative of secure and efficient AI integration within OS frameworks. By examining case studies and recent developments, our review provides a comprehensive overview of the current state of AI-OS integration, underscoring its significance in shaping the next generation of computing technologies. Finally, we explore the promising prospects of Intelligent OSes, considering not only how innovative OS architectures will pave the way for groundbreaking opportunities but also how AI will significantly contribute to advancing these next-generation OSs.


Data Poisoning: An Overlooked Threat to Power Grid Resilience

arXiv.org Artificial Intelligence

As the complexities of Dynamic Data Driven Applications Systems increase, preserving their resilience becomes more challenging. For instance, maintaining power grid resilience is becoming increasingly complicated due to the growing number of stochastic variables (such as renewable outputs) and extreme weather events that add uncertainty to the grid. Current optimization methods have struggled to accommodate this rise in complexity. This has fueled the growing interest in data-driven methods used to operate the grid, leading to more vulnerability to cyberattacks. One such disruption that is commonly discussed is the adversarial disruption, where the intruder attempts to add a small perturbation to input data in order to "manipulate" the system operation. During the last few years, work on adversarial training and disruptions on the power system has gained popularity. In this paper, we will first review these applications, specifically on the most common types of adversarial disruptions: evasion and poisoning disruptions. Through this review, we highlight the gap between poisoning and evasion research when applied to the power grid. This is due to the underlying assumption that model training is secure, leading to evasion disruptions being the primary type of studied disruption. Finally, we will examine the impacts of data poisoning interventions and showcase how they can endanger power grid resilience.


Unveiling the Decision-Making Process in Reinforcement Learning with Genetic Programming

arXiv.org Artificial Intelligence

Despite tremendous progress, machine learning and deep learning still suffer from incomprehensible predictions. Incomprehensibility, however, is not an option for the use of (deep) reinforcement learning in the real world, as unpredictable actions can seriously harm the involved individuals. In this work, we propose a genetic programming framework to generate explanations for the decision-making process of already trained agents by imitating them with programs. Programs are interpretable and can be executed to generate explanations of why the agent chooses a particular action. Furthermore, we conduct an ablation study that investigates how extending the domain-specific language by using library learning alters the performance of the method. We compare our results with the previous state of the art for this problem and show that we are comparable in performance but require much less hardware resources and computation time.


Neuromuscular Modeling for Locomotion with Wearable Assistive Robots -- A primer

arXiv.org Artificial Intelligence

Wearable assistive robots (WR) for the lower extremity are extensively documented in literature. Various interfaces have been designed to control these devices during gait and balance activities. However, achieving seamless and intuitive control requires accurate modeling of the human neuromusculoskeletal (NMSK) system. Such modeling enables WR to anticipate user intentions and determine the necessary joint assistance. Despite the existence of controllers interfacing with the NMSK system, robust and generalizable techniques across different tasks remain scarce. Designing these novel controllers necessitates the combined expertise of neurophysiologists, who understand the physiology of movement initiation and generation, and biomechatronic engineers, who design and control devices that assist movement. This paper aims to bridge the gaps between these fields by presenting a primer on key concepts and the current state of the science in each area. We present three main sections: the neuromechanics of locomotion, neuromechanical models of movement, and existing neuromechanical controllers used in WR. Through these sections, we provide a comprehensive overview of seminal studies in the field, facilitating collaboration between neurophysiologists and biomechatronic engineers for future advances in wearable robotics for locomotion.


Multimodal Misinformation Detection using Large Vision-Language Models

arXiv.org Artificial Intelligence

The increasing proliferation of misinformation and its alarming impact have motivated both industry and academia to develop approaches for misinformation detection and fact checking. Recent advances on large language models (LLMs) have shown remarkable performance in various tasks, but whether and how LLMs could help with misinformation detection remains relatively underexplored. Most of existing state-of-the-art approaches either do not consider evidence and solely focus on claim related features or assume the evidence to be provided. Few approaches consider evidence retrieval as part of the misinformation detection but rely on fine-tuning models. In this paper, we investigate the potential of LLMs for misinformation detection in a zero-shot setting. We incorporate an evidence retrieval component into the process as it is crucial to gather pertinent information from various sources to detect the veracity of claims. To this end, we propose a novel re-ranking approach for multimodal evidence retrieval using both LLMs and large vision-language models (LVLM). The retrieved evidence samples (images and texts) serve as the input for an LVLM-based approach for multimodal fact verification (LVLM4FV). To enable a fair evaluation, we address the issue of incomplete ground truth for evidence samples in an existing evidence retrieval dataset by annotating a more complete set of evidence samples for both image and text retrieval. Our experimental results on two datasets demonstrate the superiority of the proposed approach in both evidence retrieval and fact verification tasks and also better generalization capability across dataset compared to the supervised baseline.


Differential Privacy of Cross-Attention with Provable Guarantee

arXiv.org Artificial Intelligence

Cross-attention has become a fundamental module nowadays in many important artificial intelligence applications, e.g., retrieval-augmented generation (RAG), system prompt, guided stable diffusion, and many so on. Ensuring cross-attention privacy is crucial and urgently needed because its key and value matrices may contain sensitive information about companies and their users, many of which profit solely from their system prompts or RAG data. In this work, we design a novel differential privacy (DP) data structure to address the privacy security of cross-attention with a theoretical guarantee. In detail, let $n$ be the input token length of system prompt/RAG data, $d$ be the feature dimension, $0 < \alpha \le 1$ be the relative error parameter, $R$ be the maximum value of the query and key matrices, $R_w$ be the maximum value of the value matrix, and $r,s,\epsilon_s$ be parameters of polynomial kernel methods. Then, our data structure requires $\widetilde{O}(ndr^2)$ memory consumption with $\widetilde{O}(nr^2)$ initialization time complexity and $\widetilde{O}(\alpha^{-1} r^2)$ query time complexity for a single token query. In addition, our data structure can guarantee that the user query is $(\epsilon, \delta)$-DP with $\widetilde{O}(n^{-1} \epsilon^{-1} \alpha^{-1/2} R^{2s} R_w r^2)$ additive error and $n^{-1} (\alpha + \epsilon_s)$ relative error between our output and the true answer. Furthermore, our result is robust to adaptive queries in which users can intentionally attack the cross-attention system. To our knowledge, this is the first work to provide DP for cross-attention. We believe it can inspire more privacy algorithm design in large generative models (LGMs).


Frontiers of Deep Learning: From Novel Application to Real-World Deployment

arXiv.org Artificial Intelligence

Deep learning continues to re-shape numerous fields, from natural language processing and imaging to data analytics and recommendation systems. This report studies two research papers that represent recent progress on deep learning from two largely different aspects: The first paper applied the transformer networks, which are typically used in language models, to improve the quality of synthetic aperture radar image by effectively reducing the speckle noise. The second paper presents an in-storage computing design solution to enable cost-efficient and high-performance implementations of deep learning recommendation systems. In addition to summarizing each paper in terms of motivation, key ideas and techniques, and evaluation results, this report also presents thoughts and discussions about possible future research directions. By carrying out in-depth study on these two representative papers and related references, this doctoral candidate has developed better understanding on the far-reaching impact and efficient implementation of deep learning models.