Goto

Collaborating Authors

 Shah, Ankit


Importance of negative sampling in weak label learning

arXiv.org Artificial Intelligence

Weak-label learning is a challenging task that requires learning from data "bags" containing positive and negative instances, but only the bag labels are known. The pool of negative instances is usually larger than positive instances, thus making selecting the most informative negative instance critical for performance. Such a selection strategy for negative instances from each bag is an open problem that has not been well studied for weak-label learning. In this paper, we study several sampling strategies that can measure the usefulness of negative instances for weak-label learning and select them accordingly. We test our method on CIFAR-10 and AudioSet datasets and show that it improves the weak-label classification performance and reduces the computational cost compared to random sampling methods. Our work reveals that negative instances are not all equally irrelevant, and selecting them wisely can benefit weak-label learning.


Deep PackGen: A Deep Reinforcement Learning Framework for Adversarial Network Packet Generation

arXiv.org Artificial Intelligence

Recent advancements in artificial intelligence (AI) and machine learning (ML) algorithms, coupled with the availability of faster computing infrastructure, have enhanced the security posture of cybersecurity operations centers (defenders) through the development of ML-aided network intrusion detection systems (NIDS). Concurrently, the abilities of adversaries to evade security have also increased with the support of AI/ML models. Therefore, defenders need to proactively prepare for evasion attacks that exploit the detection mechanisms of NIDS. Recent studies have found that the perturbation of flow-based and packet-based features can deceive ML models, but these approaches have limitations. Perturbations made to the flow-based features are difficult to reverse-engineer, while samples generated with perturbations to the packet-based features are not playable. Our methodological framework, Deep PackGen, employs deep reinforcement learning to generate adversarial packets and aims to overcome the limitations of approaches in the literature. By taking raw malicious network packets as inputs and systematically making perturbations on them, Deep PackGen camouflages them as benign packets while still maintaining their functionality. In our experiments, using publicly available data, Deep PackGen achieved an average adversarial success rate of 66.4\% against various ML models and across different attack types. Our investigation also revealed that more than 45\% of the successful adversarial samples were out-of-distribution packets that evaded the decision boundaries of the classifiers. The knowledge gained from our study on the adversary's ability to make specific evasive perturbations to different types of malicious packets can help defenders enhance the robustness of their NIDS against evolving adversarial attacks.


Automated Audio Captioning and Language-Based Audio Retrieval

arXiv.org Artificial Intelligence

This project involved participation in the DCASE 2022 Competition (Task 6) which had two subtasks: (1) Automated Audio Captioning and (2) Language-Based Audio Retrieval. The first subtask involved the generation of a textual description for audio samples, while the goal of the second was to find audio samples within a fixed dataset that match a given description. For both subtasks, the Clotho dataset was used. The models were evaluated on BLEU1, BLEU2, BLEU3, ROUGEL, METEOR, CIDEr, SPICE, and SPIDEr scores for audio captioning and R1, R5, R10 and mARP10 scores for audio retrieval. We have conducted a handful of experiments that modify the baseline models for these tasks. Our final architecture for Automated Audio Captioning is slightly better than the baseline performance, while our model for Language-Based Audio Retrieval has surpassed its counterpart.


Improving Perceptual Quality, Intelligibility, and Acoustics on VoIP Platforms

arXiv.org Artificial Intelligence

In this paper, we present a method for fine-tuning models trained on the Deep Noise Suppression (DNS) 2020 Challenge to improve their performance on Voice over Internet Protocol (VoIP) applications. Our approach involves adapting the DNS 2020 models to the specific acoustic characteristics of VoIP communications, which includes distortion and artifacts caused by compression, transmission, and platform-specific processing. To this end, we propose a multi-task learning framework for VoIP-DNS that jointly optimizes noise suppression and VoIP-specific acoustics for speech enhancement. We evaluate our approach on a diverse VoIP scenarios and show that it outperforms both industry performance and state-of-the-art methods for speech enhancement on VoIP applications. Our results demonstrate the potential of models trained on DNS-2020 to be improved and tailored to different VoIP platforms using VoIP-DNS, whose findings have important applications in areas such as speech recognition, voice assistants, and telecommunication.


Exploiting Contextual Structure to Generate Useful Auxiliary Tasks

arXiv.org Artificial Intelligence

Reinforcement learning requires interaction with an environment, which is expensive for robots. This constraint necessitates approaches that work with limited environmental interaction by maximizing the reuse of previous experiences. We propose an approach that maximizes experience reuse while learning to solve a given task by generating and simultaneously learning useful auxiliary tasks. To generate these tasks, we construct an abstract temporal logic representation of the given task and leverage large language models to generate context-aware object embeddings that facilitate object replacements. Counterfactual reasoning and off-policy methods allow us to simultaneously learn these auxiliary tasks while solving the given target task. We combine these insights into a novel framework for multitask reinforcement learning and experimentally show that our generated auxiliary tasks share similar underlying exploration requirements as the given task, thereby maximizing the utility of directed exploration. Our approach allows agents to automatically learn additional useful policies without extra environment interaction.


Approach to Learning Generalized Audio Representation Through Batch Embedding Covariance Regularization and Constant-Q Transforms

arXiv.org Artificial Intelligence

General-purpose embedding is highly desirable for few-shot even zero-shot learning in many application scenarios, including the audio tasks. In order to understand representations better, we conducted thorough error analysis and visualization of HEAR 2021 submission results. Inspired by the analysis, this work experiments with different front-end audio preprocessing methods, including Constant-Q Transform (CQT) and Short-time Fourier transform (STFT), and proposes a Batch Embedding Covariance Regularization (BECR) term to uncover a more holistic simulation of the frequency information received by the human auditory system. We tested the models on the suite of HEAR 2021 tasks, which encompass a broad category of tasks. Preliminary results show (1) the proposed BECR can incur a more dispersed embedding on the test set, (2) BECR improves the PaSST model without extra computation complexity, and (3) STFT preprocessing outperforms CQT in all tasks we tested.


Skill Transfer for Temporally-Extended Task Specifications

arXiv.org Artificial Intelligence

Deploying robots in real-world domains, such as households and flexible manufacturing lines, requires the robots to be taskable on demand. Linear temporal logic (LTL) is a widely-used specification language with a compositional grammar that naturally induces commonalities across tasks. However, the majority of prior research on reinforcement learning with LTL specifications treats every new formula independently. We propose LTL-Transfer, a novel algorithm that enables subpolicy reuse across tasks by segmenting policies for training tasks into portable transition-centric skills capable of satisfying a wide array of unseen LTL specifications while respecting safety-critical constraints. Experiments in a Minecraft-inspired domain show that LTL-Transfer can satisfy over 90% of 500 unseen tasks after training on only 50 task specifications and never violating a safety constraint. We also deployed LTL-Transfer on a quadruped mobile manipulator in an analog household environment to demonstrate its ability to transfer to many fetch and delivery tasks in a zero-shot fashion.


Temporal Logic Imitation: Learning Plan-Satisficing Motion Policies from Demonstrations

arXiv.org Artificial Intelligence

In prior work, learning from demonstration (LfD) [1, 2] has successfully enabled robots to accomplish multi-step tasks by segmenting demonstrations (primarily of robot end-effector or tool trajectories) into sub-tasks/goals [3, 4, 5, 6, 7, 8], phases [9, 10], keyframes [11, 12], or skills/primitives/options [13, 14, 15, 16]. Most of these abstractions assume reaching subgoals sequentially will deliver the desired outcomes; however, successful imitation of many manipulation tasks with spatial/temporal constraints cannot be reduced to imitation at the motion level unless the learned motion policy also satisfies these constraints. This becomes highly relevant if we want robots to not only imitate but also generalize, adapt and be robust to perturbations imposed by humans, who are in the loop of task learning and execution. LfD techniques that learn stable motion policies with convergence guarantees (e.g., Dynamic Movement Primitives (DMP) [17], Dynamical Systems (DS) [18]) are capable of providing such desired properties but only at the motion level. As shown in Figure 1 (a-b) a robot can successfully replay a soup-scooping task while being robust to physical perturbations with a learned DS. Nevertheless, if the spoon orientation is perturbed to a state where all material is dropped, as seen in Figure 1 (c), the motion policy will still lead the robot to the target, unaware of the task-level failure or how to recover from it.


On the pragmatism of using binary classifiers over data intensive neural network classifiers for detection of COVID-19 from voice

arXiv.org Artificial Intelligence

In a self-assesment study, COVID patients reported difficulty producing certain voiced sounds and noticed changes in Lately, there has been a global effort by multiple research groups their voice [8]. to detect COVID-19 from voice. Different researchers use different Consequently, a number of research groups around the world kinds of information from the voice signal to achieve this. Various have initiated efforts on attempting to diagnose potential Covid infections types of phonated sounds and the sound of cough and breath have from recordings of vocalizations [9, 5]. While most groups all been used with varying degree of success in automated voice have focused on cough sounds [10, 11, 12] as they are a frequent based COVID-19 detection apps. In this paper, we show that detecting symptom of Covid-19, several groups have also considered other COVID-19 from voice does not require custom made nonstandard vocalizations, such as breathing sounds [10, 13] extended vowels features or complicated neural network classifiers rather it [14, 15, 16], and counts. Yet other teams have analyzed free-form can be successfully done with just standard features and simple binary speech such as those obtainable from YouTube recordings[17].


An Overview of Techniques for Biomarker Discovery in Voice Signal

arXiv.org Artificial Intelligence

This paper reflects on the effect of several categories of medical conditions on human voice, focusing on those that may be hypothesized to have effects on voice, but for which the changes themselves may be subtle enough to have eluded observation in standard analytical examinations of the voice signal. It presents three categories of techniques that can potentially uncover such elusive biomarkers and allow them to be measured and used for predictive and diagnostic purposes. These approaches include proxy techniques, model-based analytical techniques and data-driven AI techniques.