Goto

Collaborating Authors

 admin


Randomized Controlled Trials for Conditional Access Optimization Agent

arXiv.org Artificial Intelligence

AI agents are increasingly deployed to automate complex enterprise workflows, yet evidence of their effectiveness in identity governance is limited. We report results from the first randomized controlled trial (RCT) evaluating an AI agent for Conditional Access (CA) policy management in Microsoft Entra. The agent assists with four high-value tasks: policy merging, Zero-Trust baseline gap detection, phased rollout planning, and user-policy alignment. In a production-grade environment, 162 identity administrators were randomly assigned to a control group (no agent) or treatment group (agent-assisted) and asked to perform these tasks. Agent access produced substantial gains: accuracy improved by 48% and task completion time decreased by 43% while holding accuracy constant. The largest benefits emerged on cognitively demanding tasks such as baseline gap detection. These findings demonstrate that purpose-built AI agents can significantly enhance both speed and accuracy in identity administration.


CORTEX: Collaborative LLM Agents for High-Stakes Alert Triage

arXiv.org Artificial Intelligence

Security Operations Centers (SOCs) are overwhelmed by tens of thousands of daily alerts, with only a small fraction corresponding to genuine attacks. This overload creates alert fatigue, leading to overlooked threats and analyst burnout. Classical detection pipelines are brittle and context-poor, while recent LLM-based approaches typically rely on a single model to interpret logs, retrieve context, and adjudicate alerts end-to-end -- an approach that struggles with noisy enterprise data and offers limited transparency. We propose CORTEX, a multi-agent LLM architecture for high-stakes alert triage in which specialized agents collaborate over real evidence: a behavior-analysis agent inspects activity sequences, evidence-gathering agents query external systems, and a reasoning agent synthesizes findings into an auditable decision. To support training and evaluation, we release a dataset of fine-grained SOC investigations from production environments, capturing step-by-step analyst actions and linked tool outputs. Across diverse enterprise scenarios, CORTEX substantially reduces false positives and improves investigation quality over state-of-the-art single-agent LLMs.


Adaptive Wizard for Removing Cross-Tier Misconfigurations in Active Directory

arXiv.org Artificial Intelligence

Security vulnerabilities in Windows Active Directory (AD) systems are typically modeled using an attack graph and hardening AD systems involves an iterative workflow: security teams propose an edge to remove, and IT operations teams manually review these fixes before implementing the removal. As verification requires significant manual effort, we formulate an Adaptive Path Removal Problem to minimize the number of steps in this iterative removal process. In our model, a wizard proposes an attack path in each step and presents it as a set of multiple-choice options to the IT admin. The IT admin then selects one edge from the proposed set to remove. This process continues until the target $t$ is disconnected from source $s$ or the number of proposed paths reaches $B$. The model aims to optimize the human effort by minimizing the expected number of interactions between the IT admin and the security wizard. We first prove that the problem is $\mathcal{\#P}$-hard. We then propose a set of solutions including an exact algorithm, an approximate algorithm, and several scalable heuristics. Our best heuristic, called DPR, can operate effectively on larger-scale graphs compared to the exact algorithm and consistently outperforms the approximate algorithm across all graphs. We verify the effectiveness of our algorithms on several synthetic AD graphs and an AD attack graph collected from a real organization.


OPTIC: Optimizing Patient-Provider Triaging & Improving Communications in Clinical Operations using GPT-4 Data Labeling and Model Distillation

arXiv.org Artificial Intelligence

The COVID-19 pandemic has accelerated the adoption of telemedicine and patient messaging through electronic medical portals (patient medical advice requests, or PMARs). While these platforms enhance patient access to healthcare, they have also increased the burden on healthcare providers due to the surge in PMARs. This study seeks to develop an efficient tool for message triaging to reduce physician workload and improve patient-provider communication. We developed OPTIC (Optimizing Patient-Provider Triaging & Improving Communications in Clinical Operations), a powerful message triaging tool that utilizes GPT-4 for data labeling and BERT for model distillation. The study used a dataset of 405,487 patient messaging encounters from Johns Hopkins Medicine between January and June 2020. High-quality labeled data was generated through GPT-4-based prompt engineering, which was then used to train a BERT model to classify messages as "Admin" or "Clinical." The BERT model achieved 88.85% accuracy on the test set validated by GPT-4 labeling, with a sensitivity of 88.29%, specificity of 89.38%, and an F1 score of 0.8842. BERTopic analysis identified 81 distinct topics within the test data, with over 80% accuracy in classifying 58 topics. The system was successfully deployed through Epic's Nebula Cloud Platform, demonstrating its practical effectiveness in healthcare settings.


'It feels like admin': why are people falling out of love with dating apps?

The Guardian

About 1.4 million people in the UK have left the online dating scene in the past 12 months, but is that a sign that the apps don't work or that people are turning away from dating altogether? Ofcom's 2024 Online Nation report shows that dating app use declined significantly between 2023 and 2024, with a drop of nearly 16% in the use of the top 10 most popular dating apps this year. Tinder experienced the biggest loss, with more than half a million users abandoning the platform since May 2023. Bumble and Hinge were also hit hard, losing 368,000 and 131,000 users respectively in the same period. According to researchers at the University of Leeds' Centre for Love, Sex, and Relationships (CLSR), a contributing factor in the decline of online dating could be a sense of detachment from reality and fatigue at the process.


Pump and Dumps in the Bitcoin Era: Real Time Detection of Cryptocurrency Market Manipulations

arXiv.org Artificial Intelligence

In the last years, cryptocurrencies are increasingly popular. Even people who are not experts have started to invest in these securities and nowadays cryptocurrency exchanges process transactions for over 100 billion US dollars per month. However, many cryptocurrencies have low liquidity and therefore they are highly prone to market manipulation schemes. In this paper, we perform an in-depth analysis of pump and dump schemes organized by communities over the Internet. We observe how these communities are organized and how they carry out the fraud. Then, we report on two case studies related to pump and dump groups. Lastly, we introduce an approach to detect the fraud in real time that outperforms the current state of the art, so to help investors stay out of the market when a pump and dump scheme is in action.


Examining Gender and Power on Wikipedia Through Face and Politeness

arXiv.org Artificial Intelligence

We propose a framework for analyzing discourse by combining two interdependent concepts from sociolinguistic theory: face acts and politeness. While politeness has robust existing tools and data, face acts are less resourced. We introduce a new corpus created by annotating Wikipedia talk pages with face acts and we use this to train a face act tagger. We then employ our framework to study how face and politeness interact with gender and power in discussions between Wikipedia editors. Among other findings, we observe that female Wikipedians are not only more polite, which is consistent with prior studies, but that this difference corresponds with significantly more language directed at humbling aspects of their own face. Interestingly, the distinction nearly vanishes once limiting to editors with administrative power.


Teams of LLM Agents can Exploit Zero-Day Vulnerabilities

arXiv.org Artificial Intelligence

LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities). In this work, we show that teams of LLM agents can exploit real-world, zero-day vulnerabilities. Prior agents struggle with exploring many different vulnerabilities and long-range planning when used alone. To resolve this, we introduce HPTSA, a system of agents with a planning agent that can launch subagents. The planning agent explores the system and determines which subagents to call, resolving long-term planning issues when trying different vulnerabilities. We construct a benchmark of 15 real-world vulnerabilities and show that our team of agents improve over prior work by up to 4.5$\times$.


Epidemic outbreak prediction using machine learning models

arXiv.org Artificial Intelligence

In today's world, the risk of emerging and re-emerging epidemics have increased. The recent advancement in healthcare technology has made it possible to predict an epidemic outbreak in a region. Early prediction of an epidemic outbreak greatly helps the authorities to be prepared with the necessary medications and logistics required to keep things in control. In this article, we try to predict the epidemic outbreak (influenza, hepatitis and malaria) for the state of New York, USA using machine and deep learning algorithms, and a portal has been created for the same which can alert the authorities and health care organisations of the region in case of an outbreak. The algorithm takes historical data to predict the possible number of cases for 5 weeks into the future. Non-clinical factors like google search trends, social media data and weather data have also been used to predict the probability of an outbreak. Keywords: Epidemic, Clinical analysis, LSTM, ARIMA Introduction More than six different influenza pandemics and epidemics have struck in just a century. Every year nearly 500,000 people die due to seasonal influenza and other epidemics. Even with the advancement of technology, particularly in the healthcare industry, it is impossible to prevent an outbreak, but it's possible to be prepared for one. With the help of machine learning, we can now monitor and forecast the expected number of cases of a given disease for a particular region by using meteorological data, social media data, and historical data. This would be extremely useful for the health care centres and pharmacies of a particular region to be prepared in advance and stock up their inventory if needed. As seen in India during the second wave of COVID-19, due to the suddenness of the outbreak there was an unprecedented demand in healthcare resources from medicines, beds, etc. Such unexpected epidemic or pandemic outbreaks threaten to overwhelm the healthcare system of any region. But knowing the possibility of an outbreak beforehand helps the healthcare system to be prepared in advance, as it gives them enough time to accumulate the necessary items like medicines, oxygen cylinders, etc.


MyVoice: Arabic Speech Resource Collaboration Platform

arXiv.org Artificial Intelligence

We introduce MyVoice, a crowdsourcing platform designed to collect Arabic speech to enhance dialectal speech technologies. This platform offers an opportunity to design large dialectal speech datasets; and makes them publicly available. MyVoice allows contributors to select city/country-level fine-grained dialect and record the displayed utterances. Users can switch roles between contributors and annotators. The platform incorporates a quality assurance system that filters out low-quality and spurious recordings before sending them for validation. During the validation phase, contributors can assess the quality of recordings, annotate them, and provide feedback which is then reviewed by administrators. Furthermore, the platform offers flexibility to admin roles to add new data or tasks beyond dialectal speech and word collection, which are displayed to contributors. Thus, enabling collaborative efforts in gathering diverse and large Arabic speech data.