Goto

Collaborating Authors

 consortium


ASTRIDE: A Security Threat Modeling Platform for Agentic-AI Applications

Bandara, Eranga, Hass, Amin, Gore, Ross, Shetty, Sachin, Mukkamala, Ravi, Bouk, Safdar H., Liang, Xueping, Keong, Ng Wee, De Zoysa, Kasun, Withanage, Aruna, Loganathan, Nilaan

arXiv.org Artificial Intelligence

AI agent-based systems are becoming increasingly integral to modern software architectures, enabling autonomous decision-making, dynamic task execution, and multimodal interactions through large language models (LLMs). However, these systems introduce novel and evolving security challenges, including prompt injection attacks, context poisoning, model manipulation, and opaque agent-to-agent communication, that are not effectively captured by traditional threat modeling frameworks. In this paper, we introduce ASTRIDE, an automated threat modeling platform purpose-built for AI agent-based systems. ASTRIDE extends the classical STRIDE framework by introducing a new threat category, A for AI Agent-Specific Attacks, which encompasses emerging vulnerabilities such as prompt injection, unsafe tool invocation, and reasoning subversion, unique to agent-based applications. To automate threat modeling, ASTRIDE combines a consortium of fine-tuned vision-language models (VLMs) with the OpenAI-gpt-oss reasoning LLM to perform end-to-end analysis directly from visual agent architecture diagrams, such as data flow diagrams(DFDs). LLM agents orchestrate the end-to-end threat modeling automation process by coordinating interactions between the VLM consortium and the reasoning LLM. Our evaluations demonstrate that ASTRIDE provides accurate, scalable, and explainable threat modeling for next-generation intelligent systems. To the best of our knowledge, ASTRIDE is the first framework to both extend STRIDE with AI-specific threats and integrate fine-tuned VLMs with a reasoning LLM to fully automate diagram-driven threat modeling in AI agent-based applications.


Standardization of Psychiatric Diagnoses -- Role of Fine-tuned LLM Consortium and OpenAI-gpt-oss Reasoning LLM Enabled Decision Support System

Bandara, Eranga, Gore, Ross, Yarlagadda, Atmaram, Clayton, Anita H., Samuel, Preston, Rhea, Christopher K., Shetty, Sachin

arXiv.org Artificial Intelligence

The diagnosis of most mental disorders, including psychiatric evaluations, primarily depends on dialogues between psychiatrists and patients. This subjective process can lead to variability in diagnoses across clinicians and patients, resulting in inconsistencies and challenges in achieving reliable outcomes. To address these issues and standardize psychiatric diagnoses, we propose a Fine-Tuned Large Language Model (LLM) Consortium and OpenAI-gpt-oss Reasoning LLM-enabled Decision Support System for the clinical diagnosis of mental disorders. Our approach leverages fine-tuned LLMs trained on conversational datasets involving psychiatrist-patient interactions focused on mental health conditions (e.g., depression). The diagnostic predictions from individual models are aggregated through a consensus-based decision-making process, refined by the OpenAI-gpt-oss reasoning LLM. We propose a novel method for deploying LLM agents that orchestrate communication between the LLM consortium and the reasoning LLM, ensuring transparency, reliability, and responsible AI across the entire diagnostic workflow. Experimental results demonstrate the transformative potential of combining fine-tuned LLMs with a reasoning model to create a robust and highly accurate diagnostic system for mental health assessment. A prototype of the proposed platform, integrating three fine-tuned LLMs with the OpenAI-gpt-oss reasoning LLM, was developed in collaboration with the U.S. Army Medical Research Team in Norfolk, Virginia, USA. To the best of our knowledge, this work represents the first application of a fine-tuned LLM consortium integrated with a reasoning LLM for clinical mental health diagnosis paving the way for next-generation AI-powered eHealth systems aimed at standardizing psychiatric diagnoses.


Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning

Juvekar, Kush, Bhattacharya, Arghya, Khadloya, Sai, Saxena, Utkarsh

arXiv.org Artificial Intelligence

Large language models (LLMs) are entering legal workflows, yet we lack a jurisdiction-specific framework to assess their baseline competence therein. We use India's public legal examinations as a transparent proxy. Our multi-year benchmark assembles objective screens from top national and state exams and evaluates open and frontier LLMs under real-world exam conditions. To probe beyond multiple-choice questions, we also include a lawyer-graded, paired-blinded study of long-form answers from the Supreme Court's Advocate-on-Record exam. This is, to our knowledge, the first exam-grounded, India-specific yardstick for LLM court-readiness released with datasets and protocols. Our work shows that while frontier systems consistently clear historical cutoffs and often match or exceed recent top-scorer bands on objective exams, none surpasses the human topper on long-form reasoning. Grader notes converge on three reliability failure modes: procedural or format compliance, authority or citation discipline, and forum-appropriate voice and structure. These findings delineate where LLMs can assist (checks, cross-statute consistency, statute and precedent lookups) and where human leadership remains essential: forum-specific drafting and filing, procedural and relief strategy, reconciling authorities and exceptions, and ethical, accountable judgment.


Tim Berners-Lee Invented the World Wide Web. Now He Wants to Save It

The New Yorker

In 1989, Sir Tim revolutionized the online world. Today, in the era of misinformation, addictive algorithms, and extractive monopolies, he thinks he can do it again. Berners-Lee is building tools that aim to resist the Big Tech platforms, give users control over their own data, and prevent A.I. from hollowing out the open web. Tim Berners-Lee may have the smallest fame-to-impact ratio of anyone living. Strangers hardly ever recognize his face; on "Jeopardy!," Berners-Lee invented the World Wide Web, in 1989, but people informed of this often respond with a joke: Wasn't that Al Gore? Still, his creation keeps growing, absorbing our reality in the process. If you're reading this online, Berners-Lee wrote the hypertext markup language (HTML) that your browser is interpreting. He's the necessary condition behind everything from Amazon to Wikipedia, and if A.I. brings about what Sam Altman recently called "the gentle singularity"--or else buries us in slop--that, too, will be an outgrowth of his global collective consciousness. Somehow, the man responsible for all of this is a mild-mannered British Unitarian who loves model trains and folk music, and recently celebrated his seventieth birthday with a picnic on a Welsh mountain. An emeritus professor at Oxford and M.I.T., he divides his time between the U.K., Canada, and Concord, Massachusetts, where he and his wife, Rosemary Leith, live in a stout greige house older than the Republic. On the summer morning when I visited, geese honked and cicadas whined. Leith, an investor and a nonprofit director who co-founded a dot-com-era women's portal called Flametree, greeted me at the door. "We're basically guardians of the house," she said, showing me its antique features. I almost missed Berners-Lee in the converted-barn kitchen, standing, expectantly, in a blue plaid shirt. He shook my hand, then glanced at Leith. Minutes later, he and I were gliding across a pond behind the house. Berners-Lee is bronzed and wiry, with sharp cheekbones and faraway blue eyes, the right one underscored by an X-shaped wrinkle. A twitchier figure emerged when he spoke.


Incorporating LLM Embeddings for Variation Across the Human Genome

Niu, Hongqian, Bryan, Jordan, Li, Xihao, Li, Didong

arXiv.org Artificial Intelligence

In the past few years, foundation models based on large transformer networks such as Google's BERT (Kenton and Toutanova, 2019) and OpenAI's GPT family (Radford, 2018) have been shown to be invaluable aids for scientific discovery in the analysis of genomic data (Cui et al., 2024; Theodoris et al., 2023; Chen and Zou, 2025). More specifically, foundation models targeted for genomic applications typically comprise of those that are trained on enormous databases of experimental data such as scGPT (Cui et al., 2024), which was trained on transcriptomes from 33 million human cells from 441 different studies or the GeneFormer model (Theodoris et al., 2023), which was trained on 29.9 million human single-cell transcriptomes. On the other hand, foundation models based on pre-training on internet-scale databases of natural language texts may offer distinct advantages, such as potentially taking advantage of niche biological relationships which may be widely documented in scientific literature, but not necessarily be represented experimentally in large-scale genomics datasets. For this reason, some recent works have used the embedding outputs of large-language models (LLMs) such as ChatGPT (Radford, 2018) to encode the biological information contained in text-based gene descriptions, such as those in the NCBI database (Schoch et al., 2020). Notably, Chen and Zou (2025) show that these text-based gene descriptors can be input to GPT-3.5 to obtain gene embeddings that act as features/covariates for standard prediction algorithms, denoted GenePT.


Standards in the Preparation of Biomedical Research Metadata: A Bridge2AI Perspective

Caufield, Harry, Ghosh, Satrajit, Kong, Sek Wong, Parker, Jillian, Sheffield, Nathan, Patel, Bhavesh, Williams, Andrew, Clark, Timothy, Munoz-Torres, Monica C.

arXiv.org Artificial Intelligence

AI-readiness describes the degree to which data may be optimally and ethically used for subsequent AI and Machine Learning (AI/ML) methods, where those methods may involve some combination of model training, data classification, and ethical, explainable prediction. The Bridge2AI consortium has defined the particular criteria a biomedical dataset may possess to render it AI-ready: in brief, a dataset's readiness is related to its FAIRness, provenance, degree of characterization, explainability, sustainability, and computability, in addition to its accompaniment with documentation about ethical data practices. To ensure AI-readiness and to clarify data structure and relationships within Bridge2AI's Grand Challenges (GCs), particular types of metadata are necessary. The GCs within the Bridge2AI initiative include four data-generating projects focusing on generating AI/ML-ready datasets to tackle complex biomedical and behavioral research problems. These projects develop standardized, multimodal data, tools, and training resources to support AI integration, while addressing ethical data practices. Examples include using voice as a biomarker, building interpretable genomic tools, modeling disease trajectories with diverse multimodal data, and mapping cellular and molecular health indicators across the human body. This report assesses the state of metadata creation and standardization in the Bridge2AI GCs, provides guidelines where required, and identifies gaps and areas for improvement across the program. New projects, including those outside the Bridge2AI consortium, would benefit from what we have learned about creating metadata as part of efforts to promote AI readiness.


Standardization of Neuromuscular Reflex Analysis -- Role of Fine-Tuned Vision-Language Model Consortium and OpenAI gpt-oss Reasoning LLM Enabled Decision Support System

Bandara, Eranga, Gore, Ross, Shetty, Sachin, Mukkamala, Ravi, Rhea, Christopher, Yarlagadda, Atmaram, Kaushik, Shaifali, De Silva, L. H. M. P., Maznychenko, Andriy, Sokolowska, Inna, Hass, Amin, De Zoysa, Kasun

arXiv.org Artificial Intelligence

Accurate assessment of neuromuscular reflexes, such as the H-reflex, plays a critical role in sports science, rehabilitation, and clinical neurology. Traditional analysis of H-reflex EMG waveforms is subject to variability and interpretation bias among clinicians and researchers, limiting reliability and standardization. To address these challenges, we propose a Fine-Tuned Vision-Language Model (VLM) Consortium and a reasoning Large-Language Model (LLM)-enabled Decision Support System for automated H-reflex waveform interpretation and diagnosis. Our approach leverages multiple VLMs, each fine-tuned on curated datasets of H-reflex EMG waveform images annotated with clinical observations, recovery timelines, and athlete metadata. These models are capable of extracting key electrophysiological features and predicting neuromuscular states, including fatigue, injury, and recovery, directly from EMG images and contextual metadata. Diagnostic outputs from the VLM consortium are aggregated using a consensus-based method and refined by a specialized reasoning LLM, which ensures robust, transparent, and explainable decision support for clinicians and sports scientists. The end-to-end platform orchestrates seamless communication between the VLM ensemble and the reasoning LLM, integrating prompt engineering strategies and automated reasoning workflows using LLM Agents. Experimental results demonstrate that this hybrid system delivers highly accurate, consistent, and interpretable H-reflex assessments, significantly advancing the automation and standardization of neuromuscular diagnostics. To our knowledge, this work represents the first integration of a fine-tuned VLM consortium with a reasoning LLM for image-based H-reflex analysis, laying the foundation for next-generation AI-assisted neuromuscular assessment and athlete monitoring platforms.


Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra

Amin, Alan N., Potapczynski, Andres, Wilson, Andrew Gordon

arXiv.org Artificial Intelligence

To understand how genetic variants in human genomes manifest in phenotypes -- traits like height or diseases like asthma -- geneticists have sequenced and measured hundreds of thousands of individuals. Geneticists use this data to build models that predict how a genetic variant impacts phenotype given genomic features of the variant, like DNA accessibility or the presence of nearby DNA-bound proteins. As more data and features become available, one might expect predictive models to improve. Unfortunately, training these models is bottlenecked by the need to solve expensive linear algebra problems because variants in the genome are correlated with nearby variants, requiring inversion of large matrices. Previous methods have therefore been restricted to fitting small models, and fitting simplified summary statistics, rather than the full likelihood of the statistical model. In this paper, we leverage modern fast linear algebra techniques to develop DeepWAS (Deep genome Wide Association Studies), a method to train large and flexible neural network predictive models to optimize likelihood. Notably, we find that larger models only improve performance when using our full likelihood approach; when trained by fitting traditional summary statistics, larger models perform no better than small ones. We find larger models trained on more features make better predictions, potentially improving disease predictions and therapeutic target identification.


Technical Insights and Legal Considerations for Advancing Federated Learning in Bioinformatics

Malpetti, Daniele, Scutari, Marco, Gualdi, Francesco, van Setten, Jessica, van der Laan, Sander, Haitjema, Saskia, Lee, Aaron Mark, Hering, Isabelle, Mangili, Francesca

arXiv.org Machine Learning

Federated learning leverages data across institutions to improve clinical discovery while complying with data-sharing restrictions and protecting patient privacy. As the evolution of biobanks in genetics and systems biology has proved, accessing more extensive and varied data pools leads to a faster and more robust exploration and translation of results. More widespread use of federated learning may have the same impact in bioinformatics, allowing access to many combinations of genotypic, phenotypic and environmental information that are undercovered or not included in existing biobanks. This paper reviews the methodological, infrastructural and legal issues that academic and clinical institutions must address before implementing it. Finally, we provide recommendations for the reliable use of federated learning and its effective translation into clinical practice.


OpenAI rejects 97.4bn Musk bid and says company is not for sale

The Guardian

OpenAI on Friday rejected a 97.4bn bid from a consortium led by billionaire Elon Musk for the ChatGPT maker, saying the startup is not for sale. The unsolicited approach is Musk's latest attempt to block the startup he co-founded with CEO Sam Altman – but later left – from becoming a for-profit firm, as it looks to secure more capital and stay ahead in the AI race. "OpenAI is not for sale, and the board has unanimously rejected Mr Musk's latest attempt to disrupt his competition. Any potential reorganization of OpenAI will strengthen our nonprofit and its mission to ensure AGI benefits all of humanity," OpenAI said on X, quoting its chair Bret Taylor, on behalf of its board. On Tuesday, Altman told news website Axios that OpenAI was not for sale.