Goto

Collaborating Authors

 Government


Binary and Multiclass Cyberattack Classification on GeNIS Dataset

arXiv.org Artificial Intelligence

The integration of Artificial Intelligence (AI) in Network Intrusion Detection Systems (NIDS) is a promising approach to tackle the increasing sophistication of cyberattacks. However, since Machine Learning (ML) and Deep Learning (DL) models rely heavily on the quality of their training data, the lack of diverse and up-to-date datasets hinders their generalization capability to detect malicious activity in previously unseen network traffic. This study presents an experimental validation of the reliability of the GeNIS dataset for AI-based NIDS, to serve as a baseline for future benchmarks. Five feature selection methods, Information Gain, Chi-Squared Test, Recursive Feature Elimination, Mean Absolute Deviation, and Dispersion Ratio, were combined to identify the most relevant features of GeNIS and reduce its dimensionality, enabling a more computationally efficient detection. Three decision tree ensembles and two deep neural networks were trained for both binary and multiclass classification tasks. All models reached high accuracy and F1-scores, and the ML ensembles achieved slightly better generalization while remaining more efficient than DL models. Overall, the obtained results indicate that the GeNIS dataset supports intelligent intrusion detection and cy-berattack classification with time-based and quantity-based behavioral features.


Bio AI Agent: A Multi-Agent Artificial Intelligence System for Autonomous CAR-T Cell Therapy Development with Integrated Target Discovery, Toxicity Prediction, and Rational Molecular Design

arXiv.org Artificial Intelligence

Chimeric antigen receptor T-cell (CAR-T) therapy represents a paradigm shift in cancer treatment, yet development timelines of 8-12 years and clinical attrition rates exceeding 40-60% highlight critical inefficiencies in target selection, safety assessment, and molecular optimization. We present Bio AI Agent, a multi-agent artificial intelligence system powered by large language models that enables autonomous CAR-T development through collaborative specialized agents. The system comprises six autonomous agents: Target Selection Agent for multi-parametric antigen prioritization across >10,000 cancer-associated targets, Toxicity Prediction Agent for comprehensive safety profiling integrating tissue expression atlases and pharmacovigilance databases, Molecular Design Agent for rational CAR engineering, Patent Intelligence Agent for freedom-to-operate analysis, Clinical Translation Agent for regulatory compliance, and Decision Orchestration Agent for multi-agent coordination. Retrospective validation demonstrated autonomous identification of high-risk targets including FcRH5 (hepatotoxicity) and CD229 (off-tumor toxicity), patent infringement risks for CD38+SLAMF7 combinations, and generation of comprehensive development roadmaps. By enabling parallel processing, specialized reasoning, and autonomous decision-making superior to monolithic AI systems, Bio AI Agent addresses critical gaps in precision oncology development and has potential to accelerate translation of next-generation immunotherapies from discovery to clinic.


Pattern Recognition of Scrap Plastic Misclassification in Global Trade Data

arXiv.org Artificial Intelligence

We propose an interpretable machine learning framework to help identify trade data discrepancies that are challenging to detect with traditional methods. Our system analyzes trade data to find a novel inverse price-volume signature, a pattern where reported volumes increase as average unit prices decrease. The model achieves 0.9375 accuracy and was validated by comparing large-scale UN data with detailed firm-level data, confirming that the risk signatures are consistent. This scalable tool provides customs authorities with a transparent, data-driven method to shift from conventional to priority-based inspection protocols, translating complex data into actionable intelligence to support international environmental policies.


OKBench: Democratizing LLM Evaluation with Fully Automated, On-Demand, Open Knowledge Benchmarking

arXiv.org Artificial Intelligence

Knowledge-intensive question answering is central to large language models (LLMs) and is typically assessed using static benchmarks derived from sources like Wikipedia and textbooks. However, these benchmarks fail to capture evolving knowledge in a dynamic world, and centralized curation struggles to keep pace with rapid LLM advancements. To address these drawbacks, we propose Open Knowledge Bench (OKBench), a fully automated framework for generating high-quality, dynamic knowledge benchmarks on demand. Focusing on the news domain where knowledge updates daily, OKBench is an agentic framework that automates the sourcing, creation, validation, and distribution of benchmarks. Our approach democratizes benchmark creation and facilitates thorough evaluation of retrieval-augmented methods by reducing overlap with pretraining data. We evaluate our framework on a wide range open-source and proprietary LLMs of various sizes and configurations, both with and without retrieval over freshly generated knowledge. Our results reveal distinct model behaviors when confronted with new information and highlight how retrieval narrows the performance gap between small and large models. These findings underscore the importance of evaluating LLMs on evolving knowledge benchmarks.


Diverse Preference Learning for Capabilities and Alignment

arXiv.org Artificial Intelligence

The ability of LLMs to represent diverse perspectives is critical as they increasingly impact society. However, recent studies reveal that alignment algorithms such as RLHF and DPO significantly reduce the diversity of LLM outputs. Not only do aligned LLMs generate text with repetitive structure and word choice, they also approach problems in more uniform ways, and their responses reflect a narrower range of societal perspectives. We attribute this problem to the KL divergence regularizer employed in preference learning algorithms. This causes the model to systematically overweight majority opinions and sacrifice diversity in its outputs. To address this, we propose Soft Preference Learning, which decouples the entropy and cross-entropy terms in the KL penalty -- allowing for fine-grained control over LLM generation diversity. From a capabilities perspective, LLMs trained using Soft Preference Learning attain higher accuracy on difficult repeated sampling tasks and produce outputs with greater semantic and lexical diversity. From an alignment perspective, they are capable of representing a wider range of societal viewpoints and display improved logit calibration. Notably, Soft Preference Learning resembles, but is a Pareto improvement over, standard temperature scaling. As LLMs become integrated into how people consume information (Bick et al., 2024) and approach tasks (Deloitte, 2024), their ability to represent diverse perspectives is critical. For example, consider an LLM answering the following multiple-choice question: The best way to reduce income inequality is: (A) Increase minimum wage (B) Expand access to education and job training (C) Implement universal basic income (D) Lower taxes on the wealthy to stimulate job creation Imagine a survey showing people's preferences as: A (55%), B (20%), C (15%), and D (10%). How should an LLM respond to this question? Ideally, we may prefer it to reflect the range of views in the population. If an LLM assigns 99% probability to majority option A, it fails to represent the diversity of perspectives. With LLMs becoming important information sources, this may reinforce dominant narratives at the expense of minority views. However, recent studies show that alignment algorithms such as RLHF and DPO significantly reduce the diversity of LLM outputs. This leads to mode collapse towards majority preferences, as the example above shows (Kirk et al., 2024; Padmakumar & He, 2024; Rafailov et al., 2024; Christiano et al., 2023). In a generative setting, this results in repetitive responses, as illustrated in Figure 1. For example, the DPO model frequently uses the same doctor's name and 1 We highlight Doctor name, gender, and textual aberration features shown in the plots on the right. DPO responses are well-formed but lack diversity (e.g.


The Collective Turing Test: Large Language Models Can Generate Realistic Multi-User Discussions

arXiv.org Artificial Intelligence

Large Language Models (LLMs) offer new avenues to simulate online communities and social media. Potential applications range from testing the design of content recommendation algorithms to estimating the effects of content policies and interventions. However, the validity of using LLMs to simulate conversations between various users remains largely untested. We evaluated whether LLMs can convincingly mimic human group conversations on social media. We collected authentic human conversations from Reddit and generated artificial conversations on the same topic with two LLMs: Llama 3 70B and GPT-4o. When presented side-by-side to study participants, LLM-generated conversations were mistaken for human-created content 39\% of the time. In particular, when evaluating conversations generated by Llama 3, participants correctly identified them as AI-generated only 56\% of the time, barely better than random chance. Our study demonstrates that LLMs can generate social media conversations sufficiently realistic to deceive humans when reading them, highlighting both a promising potential for social simulation and a warning message about the potential misuse of LLMs to generate new inauthentic social media content.


Explainable Federated Learning for U.S. State-Level Financial Distress Modeling

arXiv.org Artificial Intelligence

We present the first application of federated learning (FL) to the U.S. National Financial Capability Study, introducing an interpretable framework for predicting consumer financial distress across all 50 states and the District of Columbia without centralizing sensitive data. Our cross-silo FL setup treats each state as a distinct data silo, simulating real-world governance in nationwide financial systems. Unlike prior work, our approach integrates two complementary explainable AI techniques to identify both global (nationwide) and local (state-specific) predictors of financial hardship, such as contact from debt collection agencies. We develop a machine learning model specifically suited for highly categorical, imbalanced survey data. This work delivers a scalable, regulation-compliant blueprint for early warning systems in finance, demonstrating how FL can power socially responsible AI applications in consumer credit risk and financial inclusion.


Back to the Future: The Role of Past and Future Context Predictability in Incremental Language Production

arXiv.org Artificial Intelligence

Contextual predictability shapes both the form and choice of words in online language production. The effects of the predictability of a word given its previous context are generally well-understood in both production and comprehension, but studies of naturalistic production have also revealed a poorly-understood backward predictability effect of a word given its future context, which may be related to future planning. Here, in two studies of naturalistic speech corpora, we investigate backward predictability effects using improved measures and more powerful language models, introducing a new principled and conceptually motivated information-theoretic predictability measure that integrates predictability from both the future and the past context. Our first study revisits classic predictability effects on word duration. Our second study investigates substitution errors within a generative framework that independently models the effects of lexical, contextual, and communicative factors on word choice, while predicting the actual words that surface as speech errors. We find that our proposed conceptually-motivated alternative to backward predictability yields qualitatively similar effects across both studies. Through a fine-grained analysis of substitution errors, we further show that different kinds of errors are suggestive of how speakers prioritize form, meaning, and context-based information during lexical planning. Together, these findings illuminate the functional roles of past and future context in how speakers encode and choose words, offering a bridge between contextual predictability effects and the mechanisms of sentence planning.


The Future of AI in the GCC Post-NPM Landscape: A Comparative Analysis of Kuwait and the UAE

arXiv.org Artificial Intelligence

Comparative evidence of how two Gulf Cooperation Council (GCC) states translate artificial intelligence (AI) ambitions into post-New Public Management (post-NPM) outcomes are scarce because most studies focus on Western democracies. To fill this gap, we examine constitutional, collective choice, and operational rules that shape AI uptake in two contrasting GCC members, the United Arab Emirates (UAE) and Kuwait, and whether they foster citizen centricity, collaborative governance, and public value creation. Anchored in Ostrom's Institutional Analysis and Development framework, the study integrates a most similar/ most different systems design with multiple sources: 62 public documents issued between 2018 and 2025, embedded UAE cases (Smart Dubai and MBZUAI), and 39 interviews with officials conducted from Aug 2024 to May 2025. Dual coding and process tracing connect rule configurations to AI performance. Our cross-case analysis identifies four mutually reinforcing mechanisms behind divergent trajectories. In the UAE, concentrated authority, credible sanctions, pro-innovation narratives, and flexible reinvestment rules transform pilots into hundreds of operating services and significant recycled savings. Kuwait's dispersed veto points, exhortative sanctions, cautious discourse, and lapsed AI budgets, by contrast, confine initiatives to pilot mode de - spite equivalent fiscal resources. These findings refine institutional theory by showing that vertical rule coherence, not wealth, determines AI's public value yield, and temper post-NPM optimism by revealing that efficiency metrics advance societal goals only when backed by enforceable safeguards. To curb ethics washing and test the transferability of these mechanisms beyond the GCC, future research should track rule diffusion over time, experiment with blended legitimacy-efficiency scorecards, and investigate how narrative framing shapes citizen consent for data sharing.


Artificial intelligence and the Gulf Cooperation Council workforce adapting to the future of work

arXiv.org Artificial Intelligence

The rapid expansion of artificial intelligence (AI) in the Gulf Cooperation Council (GCC) raises a central question: are investments in compute infrastructure matched by an equally robust build-out of skills, incentives, and governance? Grounded in socio-technical systems (STS) theory, this mixed-methods study audits workforce preparedness across Kingdom of Saudi Arabia (KSA), the United Arab Emirates (UAE), Qatar, Kuwait, Bahrain, and Oman. We combine term frequency--inverse document frequency (TF--IDF) analysis of six national AI strategies (NASs), an inventory of 47 publicly disclosed AI initiatives (January 2017--April 2025), paired case studies, the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and the Saudi Data & Artificial Intelligence Authority (SDAIA) Academy, and a scenario matrix linking oil-revenue slack (technical capacity) to regulatory coherence (social alignment). Across the corpus, 34/47 initiatives (0.72; 95% Wilson CI 0.58--0.83) exhibit joint social--technical design; country-level indices span 0.57--0.90 (small n; intervals overlap). Scenario results suggest that, under our modeled conditions, regulatory convergence plausibly binds outcomes more than fiscal capacity: fragmented rules can offset high oil revenues, while harmonized standards help preserve progress under austerity. We also identify an emerging two-track talent system, research elites versus rapidly trained practitioners, that risks labor-market bifurcation without bridging mechanisms. By extending STS inquiry to oil-rich, state-led economies, the study refines theory and sets a research agenda focused on longitudinal coupling metrics, ethnographies of coordination, and outcome-based performance indicators.