Goto

Collaborating Authors

 Government


Concentration of corporate power a 'huge' concern: U.N. rights chief

The Japan Times

Volker Turk, United Nations high commissioner for human rights, attends the Human Rights Council in Geneva on Sept. 8. | REUTERS Geneva - A few tech giants accumulating massive power coupled with artificial intelligence is posing huge global rights challenges and needs regulation, the U.N. human rights chief said in an interview. Amid increasing worries over threats to democracy and with a growing number of countries at risk of sliding towards autocracy, Volker Turk said a key concern was the seeming unbridled power of a small number of technology companies. In an interview this week at the UN rights office overlooking Lake Geneva, he pointed to how seven or eight big tech companies now boast more wealth than the entire economies of even industrialized nations. In a time of both misinformation and too much information, quality journalism is more crucial than ever. By subscribing, you can help us get the story right.


Russia-Ukraine war: List of key events, day 1,357

Al Jazeera

Is the fall of Pokrovsk inevitable? Is Trump losing patience with Putin? Will sanctions against Russian oil giants hurt Putin? Ukraine's top military commander, Oleksandr Syrskii, said the army's situation has "significantly worsened" in parts of the southeastern Zaporizhia region amid fierce fighting with Russian forces. Ukrainian President Volodymyr Zelenskyy said in a post on X that he had received an update from Syrskii, which conveyed that the situation" remains difficult" in the Zaporizhia region, as well as in the direction of the embattled city of Pokrovsk .


UK seeking to curb AI child sex abuse imagery with tougher testing

BBC News

The UK government will allow tech firms and child safety charities to proactively test artificial intelligence tools to make sure they cannot create child sexual abuse imagery. An amendment to the Crime and Policing Bill announced on Wednesday would enable authorised testers to assess models for their ability to generate illegal child sexual abuse material (CSAM) prior to their release. Technology Secretary Liz Kendall said the measures would ensure AI systems can be made safe at the source - though some campaigners argue more still needs to be done. It comes as the Internet Watch Foundation (IWF) said the number of AI-related CSAM reports had doubled over the past year. The charity, one of only a few in the world licensed to actively search for child abuse content online, said it had removed 426 pieces of reported material between January and October 2025.


Tech companies and UK child safety agencies to test AI tools' ability to create abuse images

The Guardian

Kanishka Narayan, the minister for AI and online safety, said the measure was'ultimately stopping abuse before it happens'. Kanishka Narayan, the minister for AI and online safety, said the measure was'ultimately stopping abuse before it happens'. Tech companies and UK child safety agencies to test AI tools' ability to create abuse images Tech companies and child protection agencies will be given the power to test whether artificial intelligence tools can produce child abuse images under a new UK law. The announcement was made as a safety watchdog revealed that reports of AI-generated child sexual abuse material [CSAM] have more than doubled in the past year from 199 in 2024 to 426 in 2025. Under the change, the government will give designated AI companies and child safety organisations permission to examine AI models - the underlying technology for chatbots such as ChatGPT and image generators such as Google's Veo 3 - and ensure they have safeguards to prevent them from creating images of child sexual abuse .


LLM Output Drift: Cross-Provider Validation & Mitigation for Financial Workflows

arXiv.org Machine Learning

Financial institutions deploy Large Language Models (LLMs) for reconciliations, regulatory reporting, and client communications, but nondeterministic outputs (output drift) undermine auditability and trust. We quantify drift across five model architectures (7B-120B parameters) on regulated financial tasks, revealing a stark inverse relationship: smaller models (Granite-3-8B, Qwen2.5-7B) achieve 100% output consistency at T=0.0, while GPT-OSS-120B exhibits only 12.5% consistency (95% CI: 3.5-36.0%) regardless of configuration (p<0.0001, Fisher's exact test). This finding challenges conventional assumptions that larger models are universally superior for production deployment. Our contributions include: (i) a finance-calibrated deterministic test harness combining greedy decoding (T=0.0), fixed seeds, and SEC 10-K structure-aware retrieval ordering; (ii) task-specific invariant checking for RAG, JSON, and SQL outputs using finance-calibrated materiality thresholds (plus or minus 5%) and SEC citation validation; (iii) a three-tier model classification system enabling risk-appropriate deployment decisions; and (iv) an audit-ready attestation system with dual-provider validation. We evaluated five models (Qwen2.5-7B via Ollama, Granite-3-8B via IBM watsonx.ai, Llama-3.3-70B, Mistral-Medium-2505, and GPT-OSS-120B) across three regulated financial tasks. Across 480 runs (n=16 per condition), structured tasks (SQL) remain stable even at T=0.2, while RAG tasks show drift (25-75%), revealing task-dependent sensitivity. Cross-provider validation confirms deterministic behavior transfers between local and cloud deployments. We map our framework to Financial Stability Board (FSB), Bank for International Settlements (BIS), and Commodity Futures Trading Commission (CFTC) requirements, demonstrating practical pathways for compliance-ready AI deployments.


ARGUS: A Framework for Risk-Aware Path Planning in Tactical UGV Operations

arXiv.org Artificial Intelligence

This thesis presents the development of ARGUS, a framework for mission planning for Unmanned Ground Vehicles (UGVs) in tactical environments. The system is designed to translate battlefield complexity and the commander's intent into executable action plans. To this end, ARGUS employs a processing pipeline that takes as input geospatial terrain data, military intelligence on existing threats and their probable locations, and mission priorities defined by the commander. Through a set of integrated modules, the framework processes this information to generate optimized trajectories that balance mission objectives against the risks posed by threats and terrain characteristics. A fundamental capability of ARGUS is its dynamic nature, which allows it to adapt plans in real-time in response to unforeseen events, reflecting the fluid nature of the modern battlefield. The system's interoperability were validated in a practical exercise with the Portuguese Army, where it was successfully demonstrated that the routes generated by the model can be integrated and utilized by UGV control systems. The result is a decision support tool that not only produces an optimal trajectory but also provides the necessary insights for its execution, thereby contributing to greater effectiveness and safety in the employment of autonomous ground systems.


Towards High Resolution Probabilistic Coastal Inundation Forecasting from Sparse Observations

arXiv.org Artificial Intelligence

Coastal flooding poses increasing threats to communities worldwide, necessitating accurate and hyper-local inundation forecasting for effective emergency response. However, real-world deployment of forecasting systems is often constrained by sparse sensor networks, where only a limited subset of locations may have sensors due to budget constraints. To approach this challenge, we present DIFF -SPARSE, a masked conditional diffusion model designed for probabilistic coastal inundation forecasting from sparse sensor observations. DIFF -SPARSE primarily utilizes the inundation history of a location and its neighboring locations from a context time window as spatiotemporal context. The fundamental challenge of spatiotemporal prediction based on sparse observations in the context window is addressed by introducing a novel masking strategy during training. Digital elevation data and temporal co-variates are utilized as additional spatial and temporal contexts, respectively. A convolutional neural network and a conditional UNet architecture with cross-attention mechanism are employed to capture the spatiotemporal dynamics in the data. We trained and tested DIFF -SPARSE on coastal inundation data from the Eastern Shore of Virginia and systematically assessed the performance of DIFF -SPARSE across different sparsity levels 0%, 50%, 95% missing observations. Our experiment results show that DIFF -SPARSE achieves upto 62% improvement in terms of two forecasting performance metrics compared to existing methods, at 95% sparsity level. Moreover, our ablation studies reveal that digital elevation data becomes more useful at high sparsity levels compared to temporal co-variates.


MULTI-LF: A Continuous Learning Framework for Real-Time Malicious Traffic Detection in Multi-Environment Networks

arXiv.org Artificial Intelligence

Multi-environment (M-En) networks integrate diverse traffic sources, including Internet of Things (IoT) and traditional computing systems, creating complex and evolving conditions for malicious traffic detection. Existing machine learning (ML)-based approaches, typically trained on static single-domain datasets, often fail to generalize across heterogeneous network environments. To address this gap, we develop a realistic Docker-NS3-based testbed that emulates both IoT and traditional traffic conditions, enabling the generation and capture of live, labeled network flows. The resulting M-En Dataset combines this traffic with curated public PCAP traces to provide comprehensive coverage of benign and malicious behaviors. Building on this foundation, we propose Multi-LF, a real-time continuous learning framework that combines a lightweight model (M1) for rapid detection with a deeper model (M2) for high-confidence refinement and adaptation. A confidence-based coordination mechanism enhances efficiency without compromising accuracy, while weight interpolation mitigates catastrophic forgetting during continuous updates. Features extracted at 1-second intervals capture fine-grained temporal patterns, enabling early recognition of evolving attack behaviors. Implemented and evaluated within the Docker-NS3 testbed on live traffic, Multi-LF achieves an accuracy of 0.999 while requiring human intervention for only 0.0026 percent of packets, demonstrating its effectiveness and practicality for real-time malicious traffic detection in heterogeneous network environments.


STAR-1: Safer Alignment of Reasoning LLMs with 1K Data

arXiv.org Artificial Intelligence

This paper introduces STAR-1, a high-quality, just-1k-scale safety dataset specifically designed for large reasoning models (LRMs) like DeepSeek-R1. Built on three core principles -- diversity, deliberative reasoning, and rigorous filtering -- STAR-1 aims to address the critical needs for safety alignment in LRMs. Specifically, we begin by integrating existing open-source safety datasets from diverse sources. Then, we curate safety policies to generate policy-grounded deliberative reasoning samples. Lastly, we apply a GPT-4o-based safety scoring system to select training examples aligned with best practices. Experimental results show that fine-tuning LRMs with STAR-1 leads to an average 40% improvement in safety performance across four benchmarks, while only incurring a marginal decrease (e.g., an average of 1.1%) in reasoning ability measured across five reasoning tasks. Extensive ablation studies further validate the importance of our design principles in constructing STAR-1 and analyze its efficacy across both LRMs and traditional LLMs. Our project page is https://ucsc-vlaa.github.io/STAR-1.


Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models

arXiv.org Artificial Intelligence

Large language models (LLMs) increasingly operate in social contexts, motivating analysis of how they express and shift moral judgments. In this work, we investigate the moral response of LLMs to persona role-play, prompting a LLM to assume a specific character. Using the Moral Foundations Questionnaire (MFQ), we introduce a benchmark that quantifies two properties: moral susceptibility and moral robustness, defined from the variability of MFQ scores across and within personas, respectively. We find that, for moral robustness, model family accounts for most of the variance, while model size shows no systematic effect. The Claude family is, by a significant margin, the most robust, followed by Gemini and GPT-4 models, with other families exhibiting lower robustness. In contrast, moral susceptibility exhibits a mild family effect but a clear within-family size effect, with larger variants being more susceptible. Moreover, robustness and susceptibility are positively correlated, an association that is more pronounced at the family level. Additionally, we present moral foundation profiles for models without persona role-play and for personas averaged across models. Together, these analyses provide a systematic view of how persona conditioning shapes moral behavior in large language models.