Goto

Collaborating Authors

 Undirected Networks


Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis

arXiv.org Artificial Intelligence

Imitation learning learns a policy from expert trajectories. While the expert data is believed to be crucial for imitation quality, it was found that a kind of imitation learning approach, adversarial imitation learning (AIL), can have exceptional performance. With as little as only one expert trajectory, AIL can match the expert performance even in a long horizon, on tasks such as locomotion control. There are two mysterious points in this phenomenon. First, why can AIL perform well with only a few expert trajectories? Second, why does AIL maintain good performance despite the length of the planning horizon? In this paper, we theoretically explore these two questions. For a total-variation-distance-based AIL (called TV-AIL), our analysis shows a horizon-free imitation gap $\mathcal O(\{\min\{1, \sqrt{|\mathcal S|/N} \})$ on a class of instances abstracted from locomotion control tasks. Here $|\mathcal S|$ is the state space size for a tabular Markov decision process, and $N$ is the number of expert trajectories. We emphasize two important features of our bound. First, this bound is meaningful in both small and large sample regimes. Second, this bound suggests that the imitation gap of TV-AIL is at most 1 regardless of the planning horizon. Therefore, this bound can explain the empirical observation. Technically, we leverage the structure of multi-stage policy optimization in TV-AIL and present a new stage-coupled analysis via dynamic programming


Evaluating Inter-Operator Cooperation Scenarios to Save Radio Access Network Energy

arXiv.org Artificial Intelligence

Reducing energy consumption is crucial to reduce the human debt's with regard to our planet. Therefore most companies try to reduce their energetic consumption while taking care to preserve the service delivered to their customers. To do so, a service provider (SP) typically downscale or shutdown part of its infrastructure in periods of low-activity where only few customers need the service. However an SP still needs to maintain part of its infrastructure "on", which still requires significant energy. For example a mobile national operator (MNO) needs to maintain most of its radio access network (RAN) active. Could an SP do better by cooperating with other SPs who would temporarily support its users, thus allowing it to temporarily shut down its infrastructure, and then reciprocate during another low-activity period? To answer this question, we investigated a novel collaboration framework based on multi-agent reinforcement learning (MARL) allowing negotiations between SPs as well as trustful reports from a distributed ledger technology (DLT) to evaluate the amount of energy being saved. We leveraged it to experiment three different sets of rules (free, recommended, or imposed) regulating the negotiation between multiple SPs (3, 4, 8, or 10). With respect to four cooperation metrics (efficiency, safety, incentive-compatibility, and fairness), the simulations showed that the imposed set of rules proved to be the best mode.


Binary Classification with Positive Labeling Sources

arXiv.org Artificial Intelligence

To create a large amount of training labels for machine learning models effectively and efficiently, researchers have turned to Weak Supervision (WS), which uses programmatic labeling sources rather than manual annotation. Existing works of WS for binary classification typically assume the presence of labeling sources that are able to assign both positive and negative labels to data in roughly balanced proportions. However, for many tasks of interest where there is a minority positive class, negative examples could be too diverse for developers to generate indicative labeling sources. Thus, in this work, we study the application of WS on binary classification tasks with positive labeling sources only. We propose WEAPO, a simple yet competitive WS method for producing training labels without negative labeling sources. On 10 benchmark datasets, we show WEAPO achieves the highest averaged performance in terms of both the quality of synthesized labels and the performance of the final classifier supervised with these labels. We incorporated the implementation of \method into WRENCH, an existing benchmarking platform.


Can you hear me $\textit{now}$? Sensitive comparisons of human and machine perception

arXiv.org Artificial Intelligence

The rise of machine-learning systems that process sensory input has brought with it a rise in comparisons between human and machine perception. But such comparisons face a challenge: Whereas machine perception of some stimulus can often be probed through direct and explicit measures, much of human perceptual knowledge is latent, incomplete, or unavailable for explicit report. Here, we explore how this asymmetry can cause such comparisons to misestimate the overlap in human and machine perception. As a case study, we consider human perception of \textit{adversarial speech} -- synthetic audio commands that are recognized as valid messages by automated speech-recognition systems but that human listeners reportedly hear as meaningless noise. In five experiments, we adapt task designs from the human psychophysics literature to show that even when subjects cannot freely transcribe such speech commands (the previous benchmark for human understanding), they often can demonstrate other forms of understanding, including discriminating adversarial speech from closely matched non-speech (Experiments 1--2), finishing common phrases begun in adversarial speech (Experiments 3--4), and solving simple math problems posed in adversarial speech (Experiment 5) -- even for stimuli previously described as unintelligible to human listeners. We recommend the adoption of such "sensitive tests" when comparing human and machine perception, and we discuss the broader consequences of such approaches for assessing the overlap between systems.


Inference of Affordances and Active Motor Control in Simulated Agents

arXiv.org Artificial Intelligence

Flexible, goal-directed behavior is a fundamental aspect of human life. Based on the free energy minimization principle, the theory of active inference formalizes the generation of such behavior from a computational neuroscience perspective. Based on the theory, we introduce an output-probabilistic, temporally predictive, modular artificial neural network architecture, which processes sensorimotor information, infers behavior-relevant aspects of its world, and invokes highly flexible, goal-directed behavior. We show that our architecture, which is trained end-to-end to minimize an approximation of free energy, develops latent states that can be interpreted as affordance maps. That is, the emerging latent states signal which actions lead to which effects dependent on the local context. In combination with active inference, we show that flexible, goal-directed behavior can be invoked, incorporating the emerging affordance maps. As a result, our simulated agent flexibly steers through continuous spaces, avoids collisions with obstacles, and prefers pathways that lead to the goal with high certainty. Additionally, we show that the learned agent is highly suitable for zero-shot generalization across environments: After training the agent in a handful of fixed environments with obstacles and other terrains affecting its behavior, it performs similarly well in procedurally generated environments containing different amounts of obstacles and terrains of various sizes at different locations.


Mobility-Aware Cooperative Caching in Vehicular Edge Computing Based on Asynchronous Federated and Deep Reinforcement Learning

arXiv.org Artificial Intelligence

The vehicular edge computing (VEC) can cache contents in different RSUs at the network edge to support the real-time vehicular applications. In VEC, owing to the high-mobility characteristics of vehicles, it is necessary to cache the user data in advance and learn the most popular and interesting contents for vehicular users. Since user data usually contains privacy information, users are reluctant to share their data with others. To solve this problem, traditional federated learning (FL) needs to update the global model synchronously through aggregating all users' local models to protect users' privacy. However, vehicles may frequently drive out of the coverage area of the VEC before they achieve their local model trainings and thus the local models cannot be uploaded as expected, which would reduce the accuracy of the global model. In addition, the caching capacity of the local RSU is limited and the popular contents are diverse, thus the size of the predicted popular contents usually exceeds the cache capacity of the local RSU. Hence, the VEC should cache the predicted popular contents in different RSUs while considering the content transmission delay. In this paper, we consider the mobility of vehicles and propose a cooperative Caching scheme in the VEC based on Asynchronous Federated and deep Reinforcement learning (CAFR). We first consider the mobility of vehicles and propose an asynchronous FL algorithm to obtain an accurate global model, and then propose an algorithm to predict the popular contents based on the global model. In addition, we consider the mobility of vehicles and propose a deep reinforcement learning algorithm to obtain the optimal cooperative caching location for the predicted popular contents in order to optimize the content transmission delay. Extensive experimental results have demonstrated that the CAFR scheme outperforms other baseline caching schemes.


SOCIALGYM: A Framework for Benchmarking Social Robot Navigation

arXiv.org Artificial Intelligence

Robots moving safely and in a socially compliant manner in dynamic human environments is an essential benchmark for long-term robot autonomy. However, it is not feasible to learn and benchmark social navigation behaviors entirely in the real world, as learning is data-intensive, and it is challenging to make safety guarantees during training. Therefore, simulation-based benchmarks that provide abstractions for social navigation are required. A framework for these benchmarks would need to support a wide variety of learning approaches, be extensible to the broad range of social navigation scenarios, and abstract away the perception problem to focus on social navigation explicitly. While there have been many proposed solutions, including high fidelity 3D simulators and grid world approximations, no existing solution satisfies all of the aforementioned properties for learning and evaluating social navigation behaviors. In this work, we propose SOCIALGYM, a lightweight 2D simulation environment for robot social navigation designed with extensibility in mind, and a benchmark scenario built on SOCIALGYM. Further, we present benchmark results that compare and contrast human-engineered and model-based learning approaches to a suite of off-the-shelf Learning from Demonstration (LfD) and Reinforcement Learning (RL) approaches applied to social robot navigation. These results demonstrate the data efficiency, task performance, social compliance, and environment transfer capabilities for each of the policies evaluated to provide a solid grounding for future social navigation research.


Learning an Interpretable Model for Driver Behavior Prediction with Inductive Biases

arXiv.org Artificial Intelligence

To plan safe maneuvers and act with foresight, autonomous vehicles must be capable of accurately predicting the uncertain future. In the context of autonomous driving, deep neural networks have been successfully applied to learning predictive models of human driving behavior from data. However, the predictions suffer from cascading errors, resulting in large inaccuracies over long time horizons. Furthermore, the learned models are black boxes, and thus it is often unclear how they arrive at their predictions. In contrast, rule-based models, which are informed by human experts, maintain long-term coherence in their predictions and are human-interpretable. However, such models often lack the sufficient expressiveness needed to capture complex real-world dynamics. In this work, we begin to close this gap by embedding the Intelligent Driver Model, a popular hand-crafted driver model, into deep neural networks. Our model's transparency can offer considerable advantages, e.g., in debugging the model and more easily interpreting its predictions. We evaluate our approach on a simulated merging scenario, showing that it yields a robust model that is end-to-end trainable and provides greater transparency at no cost to the model's predictive accuracy.


Adversarial Robustness Verification and Attack Synthesis in Stochastic Systems

arXiv.org Artificial Intelligence

Probabilistic model checking is a useful technique for specifying and verifying properties of stochastic systems including randomized protocols and reinforcement learning models. Existing methods rely on the assumed structure and probabilities of certain system transitions. These assumptions may be incorrect, and may even be violated by an adversary who gains control of system components. In this paper, we develop a formal framework for adversarial robustness in systems modeled as discrete time Markov chains (DTMCs). We base our framework on existing methods for verifying probabilistic temporal logic properties and extend it to include deterministic, memoryless policies acting in Markov decision processes (MDPs). Our framework includes a flexible approach for specifying structure-preserving and non structure-preserving adversarial models. We outline a class of threat models under which adversaries can perturb system transitions, constrained by an $\varepsilon$ ball around the original transition probabilities. We define three main DTMC adversarial robustness problems: adversarial robustness verification, maximal $\delta$ synthesis, and worst case attack synthesis. We present two optimization-based solutions to these three problems, leveraging traditional and parametric probabilistic model checking techniques. We then evaluate our solutions on two stochastic protocols and a collection of Grid World case studies, which model an agent acting in an environment described as an MDP. We find that the parametric solution results in fast computation for small parameter spaces. In the case of less restrictive (stronger) adversaries, the number of parameters increases, and directly computing property satisfaction probabilities is more scalable. We demonstrate the usefulness of our definitions and solutions by comparing system outcomes over various properties, threat models, and case studies.


Voice Analysis for Stress Detection and Application in Virtual Reality to Improve Public Speaking in Real-time: A Review

arXiv.org Artificial Intelligence

Stress during public speaking is common and adversely affects performance and self-confidence. Extensive research has been carried out to develop various models to recognize emotional states. However, minimal research has been conducted to detect stress during public speaking in real time using voice analysis. In this context, the current review showed that the application of algorithms was not properly explored and helped identify the main obstacles in creating a suitable testing environment while accounting for current complexities and limitations. In this paper, we present our main idea and propose a stress detection computational algorithmic model that could be integrated into a Virtual Reality (VR) application to create an intelligent virtual audience for improving public speaking skills. The developed model, when integrated with VR, will be able to detect excessive stress in real time by analysing voice features correlated to physiological parameters indicative of stress and help users gradually control excessive stress and improve public speaking performance