AITopics | Depok

Collaborating Authors

Depok

Adaptive Security Policy Management in Cloud Environments Using Reinforcement Learning

Saqib, Muhammad, Mehta, Dipkumar, Yashu, Fnu, Malhotra, Shubham

arXiv.org Artificial IntelligenceMay-15-2025

The securit y of cloud environments, such as Amazon Web Services (AWS), is complex and dynamic. St atic security policies have be come inadequate as threats evolve and cloud resources exhibit elasticity [1]. This paper addresses the limitations of static policies by proposing a security policy management framework that uses reinforcement learning (RL) to adapt dynamically. Specifically, we employ deep reinforcement learni ng algorithms, including deep Q Networks and proximal polic y op timization, enabling the learning and continuous adjustment of controls such as firewall rules and Identity an d Access Management (IAM) poli cies. The proposed RL based solution leverages cloud telemetry data (AWS Cloud Trail logs, network traffic data, threat intelligence feeds) to continuously refine security policies, maximizing threat mitigation, and compliance while minimizing resource impact. Experimental results d emonstrate that our adaptive RL bas ed framework significantly out performs static policies, achieving higher intrusion detection rates (92 % compared to 82% for static policies) and substantially reducing incident detection and response times by 58%. In a ddition, it maintains high con formity with security requirements and efficient resource usage. I. INTRODUCTION Cloud security is a critical concern as more orga nizations rely on cloud infras tructure. AWS an d other cloud platforms provide security configurations such as firewall rules and IAM policies, which are typically managed through static policies set by administrators. However, static policies cannot adapt to the dynamic nature of cloud environments, where workloads, users, and attack patterns change rapidly [1]. This rigidity exposes cloud deployments to new threats or misconfigurations that are not covered by static rules. For instance, static firewall rules may fail to detect novel attack patterns, and fixed IAM roles may become over privileged as resources scale, increasing risk . Problem Statement: Traditional cloud security policy management cannot keep pace with evolving threats and agile DevOps practices. M anual policy updates are error prone and slow.

data mining, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2505.08837

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
Europe > Latvia > Riga Municipality > Riga (0.04)
Asia > Middle East > Bahrain > Capital Governorate > Manama (0.04)
(8 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Comparative Analysis of Black-Box and White-Box Machine Learning Model in Phishing Detection

Fajar, Abdullah, Yazid, Setiadi, Budi, Indra

arXiv.org Artificial IntelligenceDec-2-2024

Background: Explainability in phishing detection model can support a further solution of phishing attack mitigation by increasing trust and understanding how phishing can be detected. Objective: The aims of this study to determine and best recommendation to apply an approach which has several components with abilities to fulfil the critical needs Methods: A methodology starting with analyzing both black-box and white-box models to get the pros and cons specifically in phishing detection. The conclusion of the analysis will be validated by experiment using a set of well-known algorithms and public phishing datasets. Experimental metrics covers 3 measurements such as predictive accuracy and explainability metrics. Conclusion: Both models are comparable in terms of interpretability and consistency, with room for improvement in diverse datasets. EBM as an example of white-box model is generally better suited for applications requiring explainability and actionable insights. Finally, each model, white-box and black-box model has positive and negative aspects both for performance metric and for explainable metric. It is important to consider the objective of model usage.

explanation, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.02084

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands (0.04)
Asia > Nepal (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.95)
(3 more...)

Add feedback

Enhancing Phishing Detection through Feature Importance Analysis and Explainable AI: A Comparative Study of CatBoost, XGBoost, and EBM Models

Fajar, Abdullah, Yazid, Setiadi, Budi, Indra

arXiv.org Artificial IntelligenceNov-11-2024

Phishing attacks remain a persistent threat to online security, demanding robust detection methods. This study investigates the use of machine learning to identify phishing URLs, emphasizing the crucial role of feature selection and model interpretability for improved performance. Employing Recursive Feature Elimination, the research pinpointed key features like "length_url," "time_domain_activation" and "Page_rank" as strong indicators of phishing attempts. The study evaluated various algorithms, including CatBoost, XGBoost, and Explainable Boosting Machine, assessing their robustness and scalability. XGBoost emerged as highly efficient in terms of runtime, making it well-suited for large datasets. CatBoost, on the other hand, demonstrated resilience by maintaining high accuracy even with reduced features. To enhance transparency and trustworthiness, Explainable AI techniques, such as SHAP, were employed to provide insights into feature importance. The study's findings highlight that effective feature selection and model interpretability can significantly bolster phishing detection systems, paving the way for more efficient and adaptable defenses against evolving cyber threats

accuracy, dataset, detection, (14 more...)

arXiv.org Artificial Intelligence

2411.0686

Country:

Asia > Indonesia > Java > West Java > Depok (0.04)
Asia > Indonesia > Java > West Java > Bandung (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.85)
(3 more...)

Add feedback

Can LLM Generate Culturally Relevant Commonsense QA Data? Case Study in Indonesian and Sundanese

Putri, Rifki Afina, Haznitrama, Faiz Ghifari, Adhista, Dea, Oh, Alice

arXiv.org Artificial IntelligenceApr-16-2024

Large Language Models (LLMs) are increasingly being used to generate synthetic data for training and evaluating models. However, it is unclear whether they can generate a good quality of question answering (QA) dataset that incorporates knowledge and cultural nuance embedded in a language, especially for low-resource languages. In this study, we investigate the effectiveness of using LLMs in generating culturally relevant commonsense QA datasets for Indonesian and Sundanese languages. To do so, we create datasets for these languages using various methods involving both LLMs and human annotators, resulting in ~4.5K questions per language (~9K in total), making our dataset the largest of its kind. Our experiments show that automatic data adaptation from an existing English dataset is less effective for Sundanese. Interestingly, using the direct generation method on the target language, GPT-4 Turbo can generate questions with adequate general knowledge in both languages, albeit not as culturally 'deep' as humans. We also observe a higher occurrence of fluency errors in the Sundanese dataset, highlighting the discrepancy between medium- and lower-resource languages.

annotator, dataset, sundanese, (15 more...)

arXiv.org Artificial Intelligence

2402.17302

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan (0.05)
Asia > Indonesia > Java > Jakarta > Jakarta (0.04)
(30 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Education (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Simple Hack for Transformers against Heavy Long-Text Classification on a Time- and Memory-Limited GPU Service

Mutasodirin, Mirza Alim, Prasojo, Radityo Eko, Abka, Achmad F., Rasyidi, Hanif

arXiv.org Artificial IntelligenceMar-19-2024

Many NLP researchers rely on free computational services, such as Google Colab, to fine-tune their Transformer models, causing a limitation for hyperparameter optimization (HPO) in long-text classification due to the method having quadratic complexity and needing a bigger resource. In Indonesian, only a few works were found on long-text classification using Transformers. Most only use a small amount of data and do not report any HPO. In this study, using 18k news articles, we investigate which pretrained models are recommended to use based on the output length of the tokenizer. We then compare some hacks to shorten and enrich the sequences, which are the removals of stopwords, punctuation, low-frequency words, and recurring words. To get a fair comparison, we propose and run an efficient and dynamic HPO procedure that can be done gradually on a limited resource and does not require a long-running optimization library. Using the best hack found, we then compare 512, 256, and 128 tokens length. We find that removing stopwords while keeping punctuation and low-frequency words is the best hack. Some of our setups manage to outperform taking 512 first tokens using a smaller 128 or 256 first tokens which manage to represent the same information while requiring less computational resources. The findings could help developers to efficiently pursue optimal performance of the models using limited resources.

classification, computational linguistic, epoch, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICAICTA59291.2023.10390269

2403.12563

Country:

North America > United States > New York > New York County > New York City (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
(4 more...)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.91)

Add feedback

Empirical and Experimental Insights into Data Mining Techniques for Crime Prediction: A Comprehensive Survey

Taha, Kamal

arXiv.org Artificial IntelligenceFeb-17-2024

This survey paper presents a comprehensive analysis of crime prediction methodologies, exploring the various techniques and technologies utilized in this area. The paper covers the statistical methods, machine learning algorithms, and deep learning techniques employed to analyze crime data, while also examining their effectiveness and limitations. We propose a methodological taxonomy that classifies crime prediction algorithms into specific techniques. This taxonomy is structured into four tiers, including methodology category, methodology sub-category, methodology techniques, and methodology sub-techniques. Empirical and experimental evaluations are provided to rank the different techniques. The empirical evaluation assesses the crime prediction techniques based on four criteria, while the experimental evaluation ranks the algorithms that employ the same sub-technique, the different sub-techniques that employ the same technique, the different techniques that employ the same methodology sub-category, the different methodology sub-categories within the same category, and the different methodology categories. The combination of methodological taxonomy, empirical evaluations, and experimental comparisons allows for a nuanced and comprehensive understanding of crime prediction algorithms, aiding researchers in making informed decisions. Finally, the paper provides a glimpse into the future of crime prediction techniques, highlighting potential advancements and opportunities for further research in this field

crime prediction, effectiveness, prediction accuracy, (16 more...)

arXiv.org Artificial Intelligence

2403.0078

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
North America > United States > Maryland > Baltimore (0.14)
(50 more...)

Genre:

Overview (1.00)
Instructional Material (1.00)
Research Report > New Finding (0.93)

Industry:

Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
(6 more...)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
(9 more...)

Add feedback

Cross-lingual Transfer Learning for Javanese Dependency Parsing

Ghiffari, Fadli Aulawi Al, Alfina, Ika, Azizah, Kurniawati

arXiv.org Artificial IntelligenceJan-22-2024

While structure learning achieves remarkable performance in high-resource languages, the situation differs for under-represented languages due to the scarcity of annotated data. This study focuses on assessing the efficacy of transfer learning in enhancing dependency parsing for Javanese, a language spoken by 80 million individuals but characterized by limited representation in natural language processing. We utilized the Universal Dependencies dataset consisting of dependency treebanks from more than 100 languages, including Javanese. We propose two learning strategies to train the model: transfer learning (TL) and hierarchical transfer learning (HTL). While TL only uses a source language to pre-train the model, the HTL method uses a source language and an intermediate language in the learning process. The results show that our best model uses the HTL method, which improves performance with an increase of 10% for both UAS and LAS evaluations compared to the baseline model.

javanese, source language, treebank, (15 more...)

arXiv.org Artificial Intelligence

2401.12072

Country:

Asia > Indonesia > Bali (0.05)
Europe > Italy (0.04)
Europe > Spain > Aragón (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)

Add feedback

i-Align: an interpretable knowledge graph alignment model

Trisedya, Bayu Distiawan, Salim, Flora D, Chan, Jeffrey, Spina, Damiano, Scholer, Falk, Sanderson, Mark

arXiv.org Artificial IntelligenceAug-25-2023

Knowledge graphs (KGs) are becoming essential resources for many downstream applications. However, their incompleteness may limit their potential. Thus, continuous curation is needed to mitigate this problem. One of the strategies to address this problem is KG alignment, i.e., forming a more complete KG by merging two or more KGs. This paper proposes i-Align, an interpretable KG alignment model. Unlike the existing KG alignment models, i-Align provides an explanation for each alignment prediction while maintaining high alignment performance. Experts can use the explanation to check the correctness of the alignment prediction. Thus, the high quality of a KG can be maintained during the curation process (e.g., the merging process of two KGs). To this end, a novel Transformer-based Graph Encoder (Trans-GE) is proposed as a key component of i-Align for aggregating information from entities' neighbors (structures). Trans-GE uses Edge-gated Attention that combines the adjacency matrix and the self-attention matrix to learn a gating mechanism to control the information aggregation from the neighboring entities. It also uses historical embeddings, allowing Trans-GE to be trained over mini-batches, or smaller sub-graphs, to address the scalability issue when encoding a large KG. Another component of i-Align is a Transformer encoder for aggregating entities' attributes. This way, i-Align can generate explanations in the form of a set of the most influential attributes/neighbors based on attention weights. Extensive experiments are conducted to show the power of i-Align. The experiments include several aspects, such as the model's effectiveness for aligning KGs, the quality of the generated explanations, and its practicality for aligning large KGs. The results show the effectiveness of i-Align in these aspects.

explanation, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10618-023-00963-3

2308.13755

Country:

Europe > Bulgaria (0.04)
Europe > Slovakia (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(5 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Deteksi Sampah di Permukaan dan Dalam Perairan pada Objek Video dengan Metode Robust and Efficient Post-Processing dan Tubelet-Level Bounding Box Linking

Tjandra, Bryan, Negara, Made S. N., Handoko, Nyoo S. C.

arXiv.org Artificial IntelligenceJul-14-2023

Indonesia, as a maritime country, has a significant portion of its territory covered by water. Ineffective waste management has resulted in a considerable amount of trash in Indonesian waters, leading to various issues. The development of an automated trash-collecting robot can be a solution to address this problem. The robot requires a system capable of detecting objects in motion, such as in videos. However, using naive object detection methods in videos has limitations, particularly when image focus is reduced and the target object is obstructed by other objects. This paper's contribution provides an explanation of the methods that can be applied to perform video object detection in an automated trash-collecting robot. The study utilizes the YOLOv5 model and the Robust & Efficient Post Processing (REPP) method, along with tubelet-level bounding box linking on the FloW and Roboflow datasets. The combination of these methods enhances the performance of naive object detection from YOLOv5 by considering the detection results in adjacent frames. The results show that the post-processing stage and tubelet-level bounding box linking can improve the quality of detection, achieving approximately 3% better performance compared to YOLOv5 alone. The use of these methods has the potential to detect surface and underwater trash and can be applied to a real-time image-based trash-collecting robot. Implementing this system is expected to mitigate the damage caused by trash in the past and improve Indonesia's waste management system in the future.

artificial intelligence, machine learning, sampah, (16 more...)

arXiv.org Artificial Intelligence

2307.10039

Country:

Asia > Indonesia > Java > West Java > Depok (0.05)
Asia > China (0.05)
Asia > Indonesia > Sumatra > South Sumatra > Palembang (0.04)
Asia > Indonesia > Java > Jakarta > Jakarta (0.04)

Genre: Research Report (1.00)

Industry: Water & Waste Management > Solid Waste Management (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Vision (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

New Product Development (NPD) through Social Media-based Analysis by Comparing Word2Vec and BERT Word Embeddings

Cintaqia, Princessa, Inoue, Matheus

arXiv.org Artificial IntelligenceApr-17-2023

This study introduces novel methods for sentiment and opinion classification of tweets to support the New Product Development (NPD) process. Two popular word embedding techniques, Word2Vec and BERT, were evaluated as inputs for classic Machine Learning and Deep Learning algorithms to identify the best-performing approach in sentiment analysis and opinion detection with limited data. The results revealed that BERT word embeddings combined with Balanced Random Forest yielded the most accurate single model for both sentiment analysis and opinion detection on a use case. Additionally, the paper provides feedback for future product development performing word graph analysis of the tweets with same sentiment to highlight potential areas of improvement.

machine learning, natural language, tweet, (14 more...)

arXiv.org Artificial Intelligence

2304.08369

Country:

North America > United States > Hawaii (0.04)
South America > Brazil > São Paulo (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.48)

Industry:

Transportation > Air (0.96)
Transportation > Passenger (0.71)
Consumer Products & Services > Travel (0.71)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback