Tardif, Pierre-Martin
Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A Cyber Defense Perspective
Tshimula, Jean Marie, Ndona, Xavier, Nkashama, D'Jeff K., Tardif, Pierre-Martin, Kabanza, Froduald, Frappier, Marc, Wang, Shengrui
Jailbreak prompts pose a significant threat in AI and cybersecurity, as they are crafted to bypass ethical safeguards in large language models, potentially enabling misuse by cybercriminals. This paper analyzes jailbreak prompts from a cyber defense perspective, exploring techniques like prompt injection and context manipulation that allow harmful content generation, content filter evasion, and sensitive information extraction. We assess the impact of successful jailbreaks, from misinformation and automated social engineering to hazardous content creation, including bioweapons and explosives. To address these threats, we propose strategies involving advanced prompt analysis, dynamic safety protocols, and continuous model fine-tuning to strengthen AI resilience. Additionally, we highlight the need for collaboration among AI researchers, cybersecurity experts, and policymakers to set standards for protecting AI systems. Through case studies, we illustrate these cyber defense approaches, promoting responsible AI practices to maintain system integrity and public trust. \textbf{\color{red}Warning: This paper contains content which the reader may find offensive.}
Deep Learning for Network Anomaly Detection under Data Contamination: Evaluating Robustness and Mitigating Performance Degradation
Nkashama, D'Jeff K., Fรฉlicien, Jordan Masakuna, Soltani, Arian, Verdier, Jean-Charles, Tardif, Pierre-Martin, Frappier, Marc, Kabanza, Froduald
Deep learning (DL) has emerged as a crucial tool in network anomaly detection (NAD) for cybersecurity. While DL models for anomaly detection excel at extracting features and learning patterns from data, they are vulnerable to data contamination -- the inadvertent inclusion of attack-related data in training sets presumed benign. This study evaluates the robustness of six unsupervised DL algorithms against data contamination using our proposed evaluation protocol. Results demonstrate significant performance degradation in state-of-the-art anomaly detection algorithms when exposed to contaminated data, highlighting the critical need for self-protection mechanisms in DL-based NAD models. To mitigate this vulnerability, we propose an enhanced auto-encoder with a constrained latent representation, allowing normal data to cluster more densely around a learnable center in the latent space. Our evaluation reveals that this approach exhibits improved resistance to data contamination compared to existing methods, offering a promising direction for more robust NAD systems.
Robustness Evaluation of Deep Unsupervised Learning Algorithms for Intrusion Detection Systems
Nkashama, D'Jeff Kanda, Soltani, Arian, Verdier, Jean-Charles, Frappier, Marc, Tardif, Pierre-Martin, Kabanza, Froduald
Recently, advances in deep learning have been observed in various fields, including computer vision, natural language processing, and cybersecurity. Machine learning (ML) has demonstrated its ability as a potential tool for anomaly detection-based intrusion detection systems to build secure computer networks. Increasingly, ML approaches are widely adopted than heuristic approaches for cybersecurity because they learn directly from data. Data is critical for the development of ML systems, and becomes potential targets for attackers. Basically, data poisoning or contamination is one of the most common techniques used to fool ML models through data. This paper evaluates the robustness of six recent deep learning algorithms for intrusion detection on contaminated data. Our experiments suggest that the state-of-the-art algorithms used in this study are sensitive to data contamination and reveal the importance of self-defense against data perturbation when developing novel models, especially for intrusion detection systems.
Characterizing Financial Market Coverage using Artificial Intelligence
Tshimula, Jean Marie, Nkashama, D'Jeff K., Owusu, Patrick, Frappier, Marc, Tardif, Pierre-Martin, Kabanza, Froduald, Brun, Armelle, Patenaude, Jean-Marc, Wang, Shengrui, Chikhaoui, Belkacem
This paper scrutinizes a database of over 4900 YouTube videos to characterize financial market coverage. Financial market coverage generates a large number of videos. Therefore, watching these videos to derive actionable insights could be challenging and complex. In this paper, we leverage Whisper, a speech-to-text model from OpenAI, to generate a text corpus of market coverage videos from Bloomberg and Yahoo Finance. We employ natural language processing to extract insights regarding language use from the market coverage. Moreover, we examine the prominent presence of trending topics and their evolution over time, and the impacts that some individuals and organizations have on the financial market. Our characterization highlights the dynamics of the financial market coverage and provides valuable insights reflecting broad discussions regarding recent financial events and the world economy.