Plotting

Results


Detect Malicious JavaScript Code Using Machine Learning

#artificialintelligence

In this article, we will consider approaches to detect obfuscated JavaScript code snippets using machine learning. Most websites use JavaScript (JS) code to make dynamic content; thus, JS code becomes a valuable attack vector against browsers, browser plug-ins, email clients, and other JS applications. Among common JS-based attacks are drive-by-download, cross-site scripting (XSS), cross-site request forgery (XSRF), malvertising/malicious advertising, and others. Most of the malicious JS codes are obfuscated in order to hide what they are doing and to avoid being detected by signature-based security systems. In other words, the obfuscation technique is a sequence of confusing code transformations to compromise its understandability, but at the same time to save its functionality.


Identifying Cyber Threats Before They Happen: Deep Learning

#artificialintelligence

Crypto.com, Microsoft, NVidia, and Okta all got hacked this year. In some hacks, attackers are looking to take data, while some are just trying things out. Either way, it is in the interest of companies to patch up the holes in their security systems as more attackers are learning to take advantage of them. The project I am working on now is one to prevent cyber threats like these from happening. When a company is hacked, there is a lot at stake.


A Comprehensive Survey on Radio Frequency (RF) Fingerprinting: Traditional Approaches, Deep Learning, and Open Challenges

arXiv.org Artificial Intelligence

Fifth generation (5G) networks and beyond envisions massive Internet of Things (IoT) rollout to support disruptive applications such as extended reality (XR), augmented/virtual reality (AR/VR), industrial automation, autonomous driving, and smart everything which brings together massive and diverse IoT devices occupying the radio frequency (RF) spectrum. Along with spectrum crunch and throughput challenges, such a massive scale of wireless devices exposes unprecedented threat surfaces. RF fingerprinting is heralded as a candidate technology that can be combined with cryptographic and zero-trust security measures to ensure data privacy, confidentiality, and integrity in wireless networks. Motivated by the relevance of this subject in the future communication networks, in this work, we present a comprehensive survey of RF fingerprinting approaches ranging from a traditional view to the most recent deep learning (DL) based algorithms. Existing surveys have mostly focused on a constrained presentation of the wireless fingerprinting approaches, however, many aspects remain untold. In this work, however, we mitigate this by addressing every aspect - background on signal intelligence (SIGINT), applications, relevant DL algorithms, systematic literature review of RF fingerprinting techniques spanning the past two decades, discussion on datasets, and potential research avenues - necessary to elucidate this topic to the reader in an encyclopedic manner.


Adversarial Attacks against Windows PE Malware Detection: A Survey of the State-of-the-Art

arXiv.org Artificial Intelligence

The malware has been being one of the most damaging threats to computers that span across multiple operating systems and various file formats. To defend against the ever-increasing and ever-evolving threats of malware, tremendous efforts have been made to propose a variety of malware detection methods that attempt to effectively and efficiently detect malware. Recent studies have shown that, on the one hand, existing ML and DL enable the superior detection of newly emerging and previously unseen malware. However, on the other hand, ML and DL models are inherently vulnerable to adversarial attacks in the form of adversarial examples, which are maliciously generated by slightly and carefully perturbing the legitimate inputs to confuse the targeted models. Basically, adversarial attacks are initially extensively studied in the domain of computer vision, and some quickly expanded to other domains, including NLP, speech recognition and even malware detection. In this paper, we focus on malware with the file format of portable executable (PE) in the family of Windows operating systems, namely Windows PE malware, as a representative case to study the adversarial attack methods in such adversarial settings. To be specific, we start by first outlining the general learning framework of Windows PE malware detection based on ML/DL and subsequently highlighting three unique challenges of performing adversarial attacks in the context of PE malware. We then conduct a comprehensive and systematic review to categorize the state-of-the-art adversarial attacks against PE malware detection, as well as corresponding defenses to increase the robustness of PE malware detection. We conclude the paper by first presenting other related attacks against Windows PE malware detection beyond the adversarial attacks and then shedding light on future research directions and opportunities.


Selecting the suitable resampling strategy for imbalanced data classification regarding dataset properties

arXiv.org Artificial Intelligence

In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples. Thus, the prediction model is unreliable although the overall model accuracy can be acceptable. Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class. However, their effectiveness depends on several factors mainly related to data intrinsic characteristics, such as imbalance ratio, dataset size and dimensionality, overlapping between classes or borderline examples. In this work, the impact of these factors is analyzed through a comprehensive comparative study involving 40 datasets from different application areas. The objective is to obtain models for automatic selection of the best resampling strategy for any dataset based on its characteristics. These models allow us to check several factors simultaneously considering a wide range of values since they are induced from very varied datasets that cover a broad spectrum of conditions. This differs from most studies that focus on the individual analysis of the characteristics or cover a small range of values. In addition, the study encompasses both basic and advanced resampling strategies that are evaluated by means of eight different performance metrics, including new measures specifically designed for imbalanced data classification. The general nature of the proposal allows the choice of the most appropriate method regardless of the domain, avoiding the search for special purpose techniques that could be valid for the target data.


Artificial Intelligence Ethics and Safety: practical tools for creating "good" models

arXiv.org Artificial Intelligence

The AI Robotics Ethics Society (AIRES) is a non-profit organization founded in 2018 by Aaron Hui to promote awareness and the importance of ethical implementation and regulation of AI. AIRES is now an organization with chapters at universities such as UCLA (Los Angeles), USC (University of Southern California), Caltech (California Institute of Technology), Stanford University, Cornell University, Brown University, and the Pontifical Catholic University of Rio Grande do Sul (Brazil). AIRES at PUCRS is the first international chapter of AIRES, and as such, we are committed to promoting and enhancing the AIRES Mission. Our mission is to focus on educating the AI leaders of tomorrow in ethical principles to ensure that AI is created ethically and responsibly. As there are still few proposals for how we should implement ethical principles and normative guidelines in the practice of AI system development, the goal of this work is to try to bridge this gap between discourse and praxis. Between abstract principles and technical implementation. In this work, we seek to introduce the reader to the topic of AI Ethics and Safety. At the same time, we present several tools to help developers of intelligent systems develop "good" models. This work is a developing guide published in English and Portuguese. Contributions and suggestions are welcome.


Artificial Intellgence -- Application in Life Sciences and Beyond. The Upper Rhine Artificial Intelligence Symposium UR-AI 2021

arXiv.org Artificial Intelligence

The TriRhenaTech alliance presents the accepted papers of the 'Upper-Rhine Artificial Intelligence Symposium' held on October 27th 2021 in Kaiserslautern, Germany. Topics of the conference are applications of Artificial Intellgence in life sciences, intelligent systems, industry 4.0, mobility and others. The TriRhenaTech alliance is a network of universities in the Upper-Rhine Trinational Metropolitan Region comprising of the German universities of applied sciences in Furtwangen, Kaiserslautern, Karlsruhe, Offenburg and Trier, the Baden-Wuerttemberg Cooperative State University Loerrach, the French university network Alsace Tech (comprised of 14 'grandes \'ecoles' in the fields of engineering, architecture and management) and the University of Applied Sciences and Arts Northwestern Switzerland. The alliance's common goal is to reinforce the transfer of knowledge, research, and technology, as well as the cross-border mobility of students.


Explainable AI for B5G/6G: Technical Aspects, Use Cases, and Research Challenges

arXiv.org Artificial Intelligence

When 5G began its commercialisation journey around 2020, the discussion on the vision of 6G also surfaced. Researchers expect 6G to have higher bandwidth, coverage, reliability, energy efficiency, lower latency, and, more importantly, an integrated "human-centric" network system powered by artificial intelligence (AI). Such a 6G network will lead to an excessive number of automated decisions made every second. These decisions can range widely, from network resource allocation to collision avoidance for self-driving cars. However, the risk of losing control over decision-making may increase due to high-speed data-intensive AI decision-making beyond designers and users' comprehension. The promising explainable AI (XAI) methods can mitigate such risks by enhancing the transparency of the black box AI decision-making process. This survey paper highlights the need for XAI towards the upcoming 6G age in every aspect, including 6G technologies (e.g., intelligent radio, zero-touch network management) and 6G use cases (e.g., industry 5.0). Moreover, we summarised the lessons learned from the recent attempts and outlined important research challenges in applying XAI for building 6G systems. This research aligns with goals 9, 11, 16, and 17 of the United Nations Sustainable Development Goals (UN-SDG), promoting innovation and building infrastructure, sustainable and inclusive human settlement, advancing justice and strong institutions, and fostering partnership at the global level.


Using Shapley Values and Variational Autoencoders to Explain Predictive Models with Dependent Mixed Features

arXiv.org Machine Learning

Explainable artificial intelligence (XAI) and interpretable machine learning (IML) have become active research fields in recent years (Adadi and Berrada 2018; Molnar 2019). This is a natural consequence as complex machine learning (ML) models are now applied to solve supervised learning problems in many high-risk areas: cancer prognosis (Kourou et al. 2015), credit scoring (Kvamme et al. 2018), and money laundering detection (Jullum, Løland, et al. 2020). The high prediction accuracy of complex ML models often comes at the expense of model interpretability. As the goal of science is to gain knowledge from the collected data, the use of black-box models hinders the understanding of the underlying relationship between the features and the response, and thereby curtail scientific discovery. Model explanation frameworks from the XAI field extract the hidden knowledge about the underlying data structure captured by a black-box model, and thereby make the model's decision-making process transparent. This is crucial for, e.g., medical researchers that apply an ML model to obtain well-performing predictions, but who simultaneously also strive to discover important risk factors. Another driving factor is the Right to Explanation legislation in EU's General Data Protection Regulation (GDPR) (European Commission 2016).


Inter-Domain Fusion for Enhanced Intrusion Detection in Power Systems: An Evidence Theoretic and Meta-Heuristic Approach

arXiv.org Artificial Intelligence

False alerts due to misconfigured/ compromised IDS in ICS networks can lead to severe economic and operational damage. To solve this problem, research has focused on leveraging deep learning techniques that help reduce false alerts. However, a shortcoming is that these works often require or implicitly assume the physical and cyber sensors to be trustworthy. Implicit trust of data is a major problem with using artificial intelligence or machine learning for CPS security, because during critical attack detection time they are more at risk, with greater likelihood and impact, of also being compromised. To address this shortcoming, the problem is reframed on how to make good decisions given uncertainty. Then, the decision is detection, and the uncertainty includes whether the data used for ML-based IDS is compromised. Thus, this work presents an approach for reducing false alerts in CPS power systems by dealing uncertainty without the knowledge of prior distribution of alerts. Specifically, an evidence theoretic based approach leveraging Dempster Shafer combination rules are proposed for reducing false alerts. A multi-hypothesis mass function model is designed that leverages probability scores obtained from various supervised-learning classifiers. Using this model, a location-cum-domain based fusion framework is proposed and evaluated with different combination rules, that fuse multiple evidence from inter-domain and intra-domain sensors. The approach is demonstrated in a cyber-physical power system testbed with Man-In-The-Middle attack emulation in a large-scale synthetic electric grid. For evaluating the performance, plausibility, belief, pignistic, etc. metrics as decision functions are considered. To improve the performance, a multi-objective based genetic algorithm is proposed for feature selection considering the decision metrics as the fitness function.