Goto

Collaborating Authors

 Dammam


Digital Twins: Initiatives, Technologies, and Use Cases in the Arab World

Communications of the ACM

Membership in ACM includes a subscription to Communications of the ACM (CACM), the computing industry's most trusted source for staying connected to the world of advanced computing. Digital twins (DTs) are virtual replicas of components, assets, systems, or processes, linked to their real-world counterparts, continuously updating their states and simulating their behavior in real-time, as illustrated in Figure 1 . They are adopted for monitoring, predicting, and optimizing the performance of diverse systems, bridging the gap between design, testing and deployment. Significant efforts are being devoted across Arab R&D institutions to export technology tackling challenges that are not only pertinent to the region, but also of global importance, e.g., energy, sustainability, disaster management, healthcare, and urbanization, among many others. For instance, Khalifa University, UAE, is pioneering research into optical wireless communication using DTs.


Advanced Crash Causation Analysis for Freeway Safety: A Large Language Model Approach to Identifying Key Contributing Factors

arXiv.org Artificial Intelligence

Understanding the factors contributing to traffic crashes and developing strategies to mitigate their severity is essential. Traditional statistical methods and machine learning models often struggle to capture the complex interactions between various factors and the unique characteristics of each crash. This research leverages large language model (LLM) to analyze freeway crash data and provide crash causation analysis accordingly. By compiling 226 traffic safety studies related to freeway crashes, a training dataset encompassing environmental, driver, traffic, and geometric design factors was created. The Llama3 8B model was fine-tuned using QLoRA to enhance its understanding of freeway crashes and their contributing factors, as covered in these studies. The fine-tuned Llama3 8B model was then used to identify crash causation without pre-labeled data through zero-shot classification, providing comprehensive explanations to ensure that the identified causes were reasonable and aligned with existing research. Results demonstrate that LLMs effectively identify primary crash causes such as alcohol-impaired driving, speeding, aggressive driving, and driver inattention. Incorporating event data, such as road maintenance, offers more profound insights. The model's practical applicability and potential to improve traffic safety measures were validated by a high level of agreement among researchers in the field of traffic safety, as reflected in questionnaire results with 88.89%. This research highlights the complex nature of traffic crashes and how LLMs can be used for comprehensive analysis of crash causation and other contributing factors. Moreover, it provides valuable insights and potential countermeasures to aid planners and policymakers in developing more effective and efficient traffic safety practices.


Weaponizing Language Models for Cybersecurity Offensive Operations: Automating Vulnerability Assessment Report Validation; A Review Paper

arXiv.org Artificial Intelligence

This, with the ever - increasing sophistication of cyberwar, calls for novel solutions. In this regard, Large Language Models (LLMs) have emerged as a highly promising tool for defensive and offensive cybersecurity - related strategies. While existing literature has focused much on the defensive use of LLMs, when it comes to their offensive utilization, very little has been reported - name ly, concerning V ulnerability A ssessment (VA) report validation. Consequentially, this paper tries to fill that gap by investigating the capabilities of LLMs in automating and improving the validation process of the report of the VA . From the critical review of the related literature, this paper hereby proposes a new approach to using the LLMs in the automation of the analysis and within the validation process of the report of the VA that could potentially reduce the number of false positives and generally enhance efficiency. These results are promisi ng for LLM automatization for improving validation on reports coming from VA in order to improve accuracy while reducing human effort and security postures. The contribution of this paper provides further evidence about the offensive and defensive LLM capabilities and therefor helps in devising more appropriate cybersecurity strategies and tools accordingly.


Artificial Intelligence (AI) Based Prediction of Mortality, for COVID-19 Patients

arXiv.org Artificial Intelligence

For severely affected COVID-19 patients, it is crucial to identify high-risk patients and predict survival and need for intensive care (ICU). Most of the proposed models are not well reported making them less reproducible and prone to high risk of bias particularly in presence of imbalance data/class. In this study, the performances of nine machine and deep learning algorithms in combination with two widely used feature selection methods were investigated to predict last status representing mortality, ICU requirement, and ventilation days. Fivefold cross-validation was used for training and validation purposes. To minimize bias, the training and testing sets were split maintaining similar distributions. Only 10 out of 122 features were found to be useful in prediction modelling with Acute kidney injury during hospitalization feature being the most important one. The algorithms performances depend on feature numbers and data pre-processing techniques. LSTM performs the best in predicting last status and ICU requirement with 90%, 92%, 86% and 95% accuracy, sensitivity, specificity, and AUC respectively. DNN performs the best in predicting Ventilation days with 88% accuracy. Considering all the factors and limitations including absence of exact time point of clinical onset, LSTM with carefully selected features can accurately predict last status and ICU requirement. DNN performs the best in predicting Ventilation days. Appropriate machine learning algorithm with carefully selected features and balance data can accurately predict mortality, ICU requirement and ventilation support. Such model can be very useful in emergency and pandemic where prompt and precise


On the Impact of Multi-dimensional Local Differential Privacy on Fairness

arXiv.org Artificial Intelligence

Data collected about individuals is regularly used to make decisions that impact those same individuals. For example, census statistics have important implications for all aspects of daily life, including the allocation of political power, the distribution of federal funds, and research in economics and social sciences. In banking industries, machine learning (ML) models leverage data to proactively monitor customer behavior, reduce the likelihood of false positives, and prevent fraud. In these settings, there is a tension between the need for accurate systems, in which individuals receive what they deserve, and the need to protect individuals from improper disclosure of their sensitive information. Differential privacy (DP) [23] is now widely recognized as the gold standard for providing formal guarantees on the privacy level achieved by an algorithm. However, central DP can only be used on the assumption of a trustworthy server. Local DP (LDP) [32] is a variant that achieves privacy guarantees for each user locally with no assumptions on third-party servers. In other words, LDP ensures that each user's data is locally obfuscated first on the client-side and then sent to the server-side, thus protecting data from privacy leaks on both the client-side and the server-side. Many Big tech companies have deployed LDP-based algorithms to use in their industrial products (e.g., Google Chrome [24] and Apple iOS [4]).


Let's Predict Who Will Move to a New Job

arXiv.org Artificial Intelligence

Any company's human resources department faces the challenge of predicting whether an applicant will search for a new job or stay with the company. In this paper, we discuss how machine learning (ML) is used to predict who will move to a new job. First, the data is pre-processed into a suitable format for ML models. To deal with categorical features, data encoding is applied and several MLA (ML Algorithms) are performed including Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), and eXtreme Gradient Boosting (XGBoost). To improve the performance of ML models, the synthetic minority oversampling technique (SMOTE) is used to retain them. Models are assessed using decision support metrics such as precision, recall, F1-Score, and accuracy.


Deep learning approach for interruption attacks detection in LEO satellite networks

arXiv.org Artificial Intelligence

The developments of satellite communication in network systems require strong and effective security plans. Attacks such as denial of service (DoS) can be detected through the use of machine learning techniques, especially under normal operational conditions. This work aims to provide an interruption detection strategy for Low Earth Orbit (\textsf{LEO}) satellite networks using deep learning algorithms. Both the training, and the testing of the proposed models are carried out with our own communication datasets, created by utilizing a satellite traffic (benign and malicious) that was generated using satellite networks simulation platforms, Omnet++ and Inet. We test different deep learning algorithms including Multi Layer Perceptron (MLP), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Gated Recurrent Units (GRU), and Long Short-term Memory (LSTM). Followed by a full analysis and investigation of detection rate in both binary classification, and multi-classes classification that includes different interruption categories such as Distributed DoS (DDoS), Network Jamming, and meteorological disturbances. Simulation results for both classification types surpassed 99.33% in terms of detection rate in scenarios of full network surveillance. However, in more realistic scenarios, the best-recorded performance was 96.12% for the detection of binary traffic and 94.35% for the detection of multi-class traffic with a false positive rate of 3.72%, using a hybrid model that combines MLP and GRU. This Deep Learning approach efficiency calls for the necessity of using machine learning methods to improve security and to give more awareness to search for solutions that facilitate data collection in LEO satellite networks.


Building for Tomorrow: Assessing the Temporal Persistence of Text Classifiers

arXiv.org Artificial Intelligence

A supervised text classification model relies on labelled datasets to train the model (Sebastiani, 2002). From an experimental perspective, the design and evaluation of classification models typically rely on data pertaining to fixed periods of time. Recent research demonstrates that such models, while showing competitive performance in their experimental environment, underperform when they need to classify new data that is distant in time from that observed during training (Alkhalifa and Zubiaga, 2022). This deterioration of performance has been demonstrated for different classification tasks, including topic classification (Rocha, Mourão, Pereira, Gonçalves, and Meira, 2008), sentiment classification (Lukes and Søgaard, 2018), hate speech detection (Florio, Basile, Polignano, Basile, and Patti, 2020), stance detection (Alkhalifa, Kochkina, and Zubiaga, 2021) and political ideology detection (Röttger and Pierrehumbert, 2021). This performance drop can happen for multiple reasons, including among others the evolution in language use (Smith, 2004) or the evolution of public opinion (Bonilla and Mo, 2019) and its extent may vary (Alkhalifa et al., 2021). This poses an important challenge and limitation on such models when one plans to continue using the model over a long period of time to classify new, incoming data, as can be the case with a stream of user-generated contents (Cheng, Chen, Lee, and Li, 2021).


Introducing the newest AWS Heroes – June, 2021

#artificialintelligence

We at AWS continue to be impressed by the passion AWS enthusiasts have for knowledge sharing and supporting peer-to-peer learning in tech communities. A select few of the most influential and active community leaders in the world, who truly go above and beyond to create content and help others build better & faster on AWS, are recognized as AWS Heroes. Data Hero Anahit Pogosova is a Lead Cloud Software Engineer at Solita. She has been architecting and building software solutions with various customers for over a decade. Anahit started working with monolithic on-prem software, but has since moved all the way to the cloud, nowadays focusing mostly on AWS Data and Serverless services.


Effective Email Spam Detection System using Extreme Gradient Boosting

arXiv.org Artificial Intelligence

The popularity, cost-effectiveness and ease of information exchange that electronic mails offer to electronic device users has been plagued with the rising number of unsolicited or spam emails. Driven by the need to protect email users from this growing menace, research in spam email filtering/detection systems has being increasingly active in the last decade. However, the adaptive nature of spam emails has often rendered most of these systems ineffective. While several spam detection models have been reported in literature, the reported performance on an out of sample test data shows the room for more improvement. Presented in this research is an improved spam detection model based on Extreme Gradient Boosting (XGBoost) which to the best of our knowledge has received little attention spam email detection problems. Experimental results show that the proposed model outperforms earlier approaches across a wide range of evaluation metrics. A thorough analysis of the model results in comparison to the results of earlier works is also presented.