Government
Fuzzing the brain: Automated stress testing for the safety of ML-driven neurostimulation
Downing, Mara, Peng, Matthew, Granley, Jacob, Beyeler, Michael, Bultan, Tevfik
Objective: Machine learning (ML) models are increasingly used to generate electrical stimulation patterns in neuroprosthetic devices such as visual prostheses. While these models promise precise and personalized control, they also introduce new safety risks when model outputs are delivered directly to neural tissue. We propose a systematic, quantitative approach to detect and characterize unsafe stimulation patterns in ML-driven neurostimulation systems. Approach: We adapt an automated software testing technique known as coverage-guided fuzzing to the domain of neural stimulation. Here, fuzzing performs stress testing by perturbing model inputs and tracking whether resulting stimulation violates biophysical limits on charge density, instantaneous current, or electrode co-activation. The framework treats encoders as black boxes and steers exploration with coverage metrics that quantify how broadly test cases span the space of possible outputs and violation types. Main results: Applied to deep stimulus encoders for the retina and cortex, the method systematically reveals diverse stimulation regimes that exceed established safety limits. Two violation-output coverage metrics identify the highest number and diversity of unsafe outputs, enabling interpretable comparisons across architectures and training strategies. Significance: Violation-focused fuzzing reframes safety assessment as an empirical, reproducible process. By transforming safety from a training heuristic into a measurable property of the deployed model, it establishes a foundation for evidence-based benchmarking, regulatory readiness, and ethical assurance in next-generation neural interfaces.
Enhancing Dimensionality Prediction in Hybrid Metal Halides via Feature Engineering and Class-Imbalance Mitigation
Karabin, Mariia, Armstrong, Isaac, Beck, Leo, Apanel, Paulina, Eisenbach, Markus, Mitzi, David B., Terletska, Hanna, Heinz, Hendrik
We present a machine learning framework for predicting the structural dimensionality of hybrid metal halides (HMHs), including organic-inorganic perovskites, using a combination of chemically-informed feature engineering and advanced class-imbalance handling techniques. The dataset, consisting of 494 HMH structures, is highly imbalanced across dimensionality classes (0D, 1D, 2D, 3D), posing significant challenges to predictive modeling. This dataset was later augmented to 1336 via the Synthetic Minority Oversampling Technique (SMOTE) to mitigate the effects of the class imbalance. We developed interaction-based descriptors and integrated them into a multi-stage workflow that combines feature selection, model stacking, and performance optimization to improve dimensionality prediction accuracy. Our approach significantly improves F1-scores for underrepresented classes, achieving robust cross-validation performance across all dimensionalities.
MCP-AI: Protocol-Driven Intelligence Framework for Autonomous Reasoning in Healthcare
ElSayed, Zag, Erickson, Craig, Pedapati, Ernest
Healthcare AI systems have historically faced challenges in merging contextual reasoning, long-term state management, and human-verifiable workflows into a cohesive framework. This paper introduces a completely innovative architecture and concept: combining the Model Context Protocol (MCP) with a specific clinical application, known as MCP-AI. This integration allows intelligent agents to reason over extended periods, collaborate securely, and adhere to authentic clinical logic, representing a significant shift away from traditional Clinical Decision Support Systems (CDSS) and prompt-based Large Language Models (LLMs). As healthcare systems become more complex, the need for autonomous, context-aware clinical reasoning frameworks has become urgent. We present MCP-AI, a novel architecture for explainable medical decision-making built upon the Model Context Protocol (MCP) a modular, executable specification for orchestrating generative and descriptive AI agents in real-time workflows. Each MCP file captures clinical objectives, patient context, reasoning state, and task logic, forming a reusable and auditable memory object. Unlike conventional CDSS or stateless prompt-based AI systems, MCP-AI supports adaptive, longitudinal, and collaborative reasoning across care settings. MCP-AI is validated through two use cases: (1) diagnostic modeling of Fragile X Syndrome with comorbid depression, and (2) remote coordination for Type 2 Diabetes and hypertension. In either scenario, the protocol facilitates physician-in-the-loop validation, streamlines clinical processes, and guarantees secure transitions of AI responsibilities between healthcare providers. The system connects with HL7/FHIR interfaces and adheres to regulatory standards, such as HIPAA and FDA SaMD guidelines. MCP-AI provides a scalable basis for interpretable, composable, and safety-oriented AI within upcoming clinical environments.
Enhancing Clinical Note Generation with ICD-10, Clinical Ontology Knowledge Graphs, and Chain-of-Thought Prompting Using GPT-4
Makohon, Ivan, Najafi, Mohamad, Wu, Jian, Brochhausen, Mathias, Li, Yaohang
In the past decade a surge in the amount of electronic health record (EHR) data in the United States, attributed to a favorable policy environment created by the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 and the 21st Century Cures Act of 2016. Clinical notes for patients' assessments, diagnoses, and treatments are captured in these EHRs in free-form text by physicians, who spend a considerable amount of time entering and editing them. Manually writing clinical notes takes a considerable amount of a doctor's valuable time, increasing the patient's waiting time and possibly delaying diagnoses. Large language models (LLMs) possess the ability to generate news articles that closely resemble human-written ones. We investigate the usage of Chain-of-Thought (CoT) prompt engineering to improve the LLM's response in clinical note generation. In our prompts, we use as input International Classification of Diseases (ICD) codes and basic patient information. We investigate a strategy that combines the traditional CoT with semantic search results to improve the quality of generated clinical notes. Additionally, we infuse a knowledge graph (KG) built from clinical ontology to further enrich the domain-specific knowledge of generated clinical notes. We test our prompting technique on six clinical cases from the CodiEsp test dataset using GPT-4 and our results show that it outperformed the clinical notes generated by standard one-shot prompts.
Public Sentiment Analysis of Traffic Management Policies in Knoxville: A Social Media Driven Study
This study presents a comprehensive analysis of public sentiment toward traffic management policies in Knoxville, Tennessee, utilizing social media data from Twitter and Reddit platforms. We collected and analyzed 7906 posts spanning January 2022 to December 2023, employing Valence Aware Dictionary and sEntiment Reasoner (VADER) for sentiment analysis and Latent Dirichlet Allocation (LDA) for topic modeling. Our findings reveal predominantly negative sentiment, with significant variations across platforms and topics. Twitter exhibited more negative sentiment compared to Reddit. Topic modeling identified six distinct themes, with construction-related topics showing the most negative sentiment while general traffic discussions were more positive. Spatiotemporal analysis revealed geographic and temporal patterns in sentiment expression. The research demonstrates social media's potential as a real-time public sentiment monitoring tool for transportation planning and policy evaluation.
Towards a Generalisable Cyber Defence Agent for Real-World Computer Networks
Recent advances in deep reinforcement learning for autonomous cyber defence have resulted in agents that can successfully defend simulated computer networks against cyber-attacks. However, many of these agents would need retraining to defend networks with differing topology or size, making them poorly suited to real-world networks where topology and size can vary over time. In this research we introduce a novel set of Topological Extensions for Reinforcement Learning Agents (TERLA) that provide generalisability for the defence of networks with differing topology and size, without the need for retraining. Our approach involves the use of heterogeneous graph neural network layers to produce a fixed-size latent embedding representing the observed network state. This representation learning stage is coupled with a reduced, fixed-size, semantically meaningful and interpretable action space. We apply TERLA to a standard deep reinforcement learning Proximal Policy Optimisation (PPO) agent model, and to reduce the sim-to-real gap, conduct our research using Cyber Autonomy Gym for Experimentation (CAGE) Challenge 4. This Cyber Operations Research Gym environment has many of the features of a real-world network, such as realistic Intrusion Detection System (IDS) events and multiple agents defending network segments of differing topology and size. TERLA agents retain the defensive performance of vanilla PPO agents whilst showing improved action efficiency. Generalisability has been demonstrated by showing that all TERLA agents have the same network-agnostic neural network architecture, and by deploying a single TERLA agent multiple times to defend network segments with differing topology and size, showing improved defensive performance and efficiency.
VERIRAG: A Post-Retrieval Auditing of Scientific Study Summaries
Mohole, Shubham, Choi, Hongjun, Liu, Shusen, Klymko, Christine, Kushwaha, Shashank, Shi, Derek, Sakla, Wesam, Galhotra, Sainyam, Glatt, Ruben
Can democratized information gatekeepers and community note writers effectively decide what scientific information to amplify? Lacking domain expertise, such gatekeepers rely on automated reasoning agents that use RAG to ground evidence to cited sources. But such standard RAG systems validate summaries via semantic grounding and suffer from "methodological blindness," treating all cited evidence as equally valid regardless of rigor. To address this, we introduce VERIRAG, a post-retrieval auditing framework that shifts the task from classification to methodological vulnerability detection. Using private Small Language Models (SLMs), VERIRAG audits source papers against the Veritable taxonomy of statistical rigor. We contribute: (1) a benchmark of 1,730 summaries with realistic, non-obvious perturbations modeled after retracted papers; (2) the auditable Veritable taxonomy; and (3) an operational system that improves Macro F1 by at least 19 points over baselines using GPT-based SLMs, a result that replicates across MISTRAL and Gemma architectures. Given the complexity of detecting non-obvious flaws, we view VERIRAG as a "vulnerability-detection copilot," providing structured audit trails for human editors. In our experiments, individual human testers found over 80% of the generated audit trails useful for decision-making. We plan to release the dataset and code to support responsible science advocacy.
Government promises 50,000 new apprenticeships in youth employment push
The government says some 50,000 young people are expected to benefit from a programme to expand apprenticeships as it looks to tackle youth unemployment. The ยฃ725 million package, which was earmarked in the Budget and covers the next three years, will be used to create apprenticeships in sectors including AI, hospitality and engineering. Apprenticeships for people under the age of 25 at small and medium-sized businesses will be fully funded as part of the package, removing the 5% that they currently have to pay. The government is aiming to reverse a decline in the number of young people starting apprenticeships, which has fallen by almost 40% in the past decade. The funding also includes ยฃ140m for a pilot that the Department for Work and Pensions says will allow local mayors to connect young people with employers and apprenticeship opportunities, although it is unclear exactly how the money will be used.
A robot walks into a bar: can a Melbourne researcher get AI to do comedy?
An ensemble of about 10 robots - which will not be androids but ground vehicles between 40cm and 2m tall - will work with humans to learn how to be funny. An ensemble of about 10 robots - which will not be androids but ground vehicles between 40cm and 2m tall - will work with humans to learn how to be funny. A robot walks into a bar: can a Melbourne researcher get AI to do comedy? Robots can make humans laugh - mostly when they fall over - but a new research project is looking at whether robots using AI could ever be genuinely funny. If you ask ChatGPT for a funny joke, it will serve you up something that belongs in a Christmas cracker: "Why don't skeletons fight each other? Because they don't have the guts."