Goto

Collaborating Authors

 Government


Data-Centric Lessons To Improve Speech-Language Pretraining

arXiv.org Artificial Intelligence

Spoken Question-Answering (SQA) is a core capability for useful and interactive artificial intelligence systems. Recently, several speech-language models (SpeechLMs) have been released with a specific focus on improving their SQA performance. However, a lack of controlled ablations of pretraining data processing and curation makes it challenging to understand what factors account for performance, despite substantial gains from similar studies in other data modalities. In this work, we address this gap by conducting a data-centric exploration for pretraining SpeechLMs. We focus on three research questions fundamental to speech-language pretraining data: (1) how to process raw web-crawled audio content for speech-text pretraining, (2) how to construct synthetic pretraining datasets to augment web-crawled data and (3) how to interleave (text, audio) segments into training sequences. We apply the insights from our controlled data-centric ablations to pretrain a 3.8B-parameter SpeechLM, called SpeLangy, that outperforms models that are up to 3x larger by 10.2% absolute performance. We hope our findings highlight the impact of effective data curation for speech-language pretraining and guide future data-centric exploration in SpeechLMs.


Photorealistic Inpainting for Perturbation-based Explanations in Ecological Monitoring

arXiv.org Artificial Intelligence

Ecological monitoring is increasingly automated by vision models, yet opaque predictions limit trust and field adoption. We present an inpainting-guided, perturbation-based explanation technique that produces photorealistic, mask-localized edits that preserve scene context. Unlike masking or blurring, these edits stay in-distribution and reveal which fine-grained morphological cues drive predictions in tasks such as species recognition and trait attribution. We demonstrate the approach on a YOLOv9 detector fine-tuned for harbor seal detection in Glacier Bay drone imagery, using Segment-Anything-Model-refined masks to support two interventions: (i) object removal/replacement (e.g., replacing seals with plausible ice/water or boats) and (ii) background replacement with original animals composited onto new scenes. Explanations are assessed by re-scoring perturbed images (flip rate, confidence drop) and by expert review for ecological plausibility and interpretability. The resulting explanations localize diagnostic structures, avoid deletion artifacts common to traditional perturbations, and yield domain-relevant insights that support expert validation and more trustworthy deployment of AI in ecology.


Beyond Accuracy: Rethinking Hallucination and Regulatory Response in Generative AI

arXiv.org Artificial Intelligence

Hallucination in generative AI is often treated as a technical failure to produce factually correct output. Yet this framing underrepresents the broader significance of hallucinated content in language models, which may appear fluent, persuasive, and contextually appropriate while conveying distortions that escape conventional accuracy checks. This paper critically examines how regulatory and evaluation frameworks have inherited a narrow view of hallucination, one that prioritises surface verifiability over deeper questions of meaning, influence, and impact. We propose a layered approach to understanding hallucination risks, encompassing epistemic instability, user misdirection, and social-scale effects. Drawing on interdisciplinary sources and examining instruments such as the EU AI Act and the GDPR, we show that current governance models struggle to address hallucination when it manifests as ambiguity, bias reinforcement, or normative convergence. Rather than improving factual precision alone, we argue for regulatory responses that account for languages generative nature, the asymmetries between system and user, and the shifting boundaries between information, persuasion, and harm.


Methodological Insights into Structural Causal Modelling and Uncertainty-Aware Forecasting for Economic Indicators

arXiv.org Artificial Intelligence

This paper presents a methodological approach to financial time series analysis by combining causal discovery and uncertainty-aware forecasting. As a case study, we focus on four key U.S. macroeconomic indicators -- GDP, economic growth, inflation, and unemployment -- and we apply the LPCMCI framework with Gaussian Process Distance Correlation (GPDC) to uncover dynamic causal relationships in quarterly data from 1970 to 2021. Our results reveal a robust unidirectional causal link from economic growth to GDP and highlight the limited connectivity of inflation, suggesting the influence of latent factors. Unemployment exhibits strong autore-gressive dependence, motivating its use as a case study for probabilistic forecasting. Leveraging the Chronos framework, a large language model trained for time series, we perform zero-shot predictions on unemployment. This approach delivers accurate forecasts one and two quarters ahead, without requiring task-specific training. Crucially, the model's uncertainty-aware predictions yield 90% confidence intervals, enabling effective anomaly detection through statistically principled deviation analysis. This study demonstrates the value of combining causal structure learning with probabilistic language models to inform economic policy and enhance forecasting robustness.


Modeling the Economic Impacts of AI Openness Regulation

arXiv.org Artificial Intelligence

Regulatory frameworks, such as the EU AI Act, encourage openness of general-purpose AI models by offering legal exemptions for "open-source" models. Despite this legislative attention on openness, the definition of open-source foundation models remains ambiguous. This paper models the strategic interactions among the creator of a general-purpose model (the generalist) and the entity that fine-tunes the general-purpose model to a specialized domain or task (the specialist), in response to regulatory requirements on model openness. We present a stylized model of the regulator's choice of an open-source definition to evaluate which AI openness standards will establish appropriate economic incentives for developers. Our results characterize market equilibria -- specifically, upstream model release decisions and downstream fine-tuning efforts -- under various openness regulations and present a range of effective regulatory penalties and open-source thresholds. Overall, we find the model's baseline performance determines when increasing the regulatory penalty vs. the open-source threshold will significantly alter the generalist's release strategy. Our model provides a theoretical foundation for AI governance decisions around openness and enables evaluation and refinement of practical open-source policies.


Forecast reconciliation with non-linear constraints

arXiv.org Machine Learning

Methods for forecasting time series adhering to linear constraints have seen notable development in recent years, especially with the advent of forecast reconciliation. This paper extends forecast reconciliation to the open question of non-linearly constrained time series. Non-linear constraints can emerge with variables that are formed as ratios such as mortality rates and unemployment rates. On the methodological side, Non-linearly Constrained Reconciliation (NLCR) is proposed. This algorithm adjusts forecasts that fail to meet non-linear constraints, in a way that ensures the new forecasts meet the constraints. The NLCR method is a projection onto a non-linear surface, formulated as a constrained optimisation problem. On the theoretical side, optimisation methods are again used, this time to derive sufficient conditions for when the NLCR methodology is guaranteed to improve forecast accuracy. Finally on the empirical side, NLCR is applied to two datasets from demography and economics and shown to significantly improve forecast accuracy relative to relevant benchmarks.


What Does It Take to Build a Performant Selective Classifier?

arXiv.org Machine Learning

Selective classifiers improve model reliability by abstaining on inputs the model deems uncertain. However, few practical approaches achieve the gold-standard performance of a perfect-ordering oracle that accepts examples exactly in order of correctness. Our work formalizes this shortfall as the selective-classification gap and present the first finite-sample decomposition of this gap to five distinct sources of looseness: Bayes noise, approximation error, ranking error, statistical noise, and implementation- or shift-induced slack. Crucially, our analysis reveals that monotone post-hoc calibration -- often believed to strengthen selective classifiers -- has limited impact on closing this gap, since it rarely alters the model's underlying score ranking. Bridging the gap therefore requires scoring mechanisms that can effectively reorder predictions rather than merely rescale them. We validate our decomposition on synthetic two-moons data and on real-world vision and language benchmarks, isolating each error component through controlled experiments. Our results confirm that (i) Bayes noise and limited model capacity can account for substantial gaps, (ii) only richer, feature-aware calibrators meaningfully improve score ordering, and (iii) data shift introduces a separate slack that demands distributionally robust training. Together, our decomposition yields a quantitative error budget as well as actionable design guidelines that practitioners can use to build selective classifiers which approximate ideal oracle behavior more closely.


Urgent warning over cannabis as UK's top psychiatrist warns it isn't safe for young brains still developing

Daily Mail - Science & tech

Entitled son, 21, of top lawyer mows down police with his Mercedes G-Wagen...as he smiles in his mugshot Trump'humiliates' speaker Mike Johnson in private conversation as government shutdown rumbles on Tupac's humiliating intimate disfigurement revealed... and how his lies to cover it up led to his murder'I'm Madeline - and this is what I have to say to Lily Allen': Read world exclusive reveal of mother who had affair with star's husband David Harbour, how it started and how she feels about THOSE texts being exposed Loved up Katy Perry holds hands with Justin Trudeau as they officially confirm romance while celebrating the singer's birthday in Paris Furrow-browed boyfriend'strangled girlfriend and set her house on fire while newborn baby was inside' I've uncovered my husband's filthy Viagra habit: But, warns DEAR JANE, one thing YOU are doing is making it so much worse I've started having heart palpitations. Jackie Kennedy's revenge romance with American political icon: Revealed for first time in titillating love letters, the man who helped her cope with JFK's cheating The night that haunted a Wisconsin town forever... and the little girl whose trick-or-treat next door ended in horror Why going gray may save you from CANCER... as scientists make bombshell breakthrough Brazen demands for flying private REVEALED by the woman paid to fulfill them: 'Answer is always yes' They sneered at Trump's'eagle graveyards' - but now Biden's hated windmills crippling an American legend are haunting the US military Kim Kardashian's just been caught in a despicable lie. She can cry all she wants... there's no hiding the truth now: CAROLINE BULLOCK Tua Tagovailoa's swollen eye sparks concern after Dolphins QB woke up with mystery illness on day of Falcons game JD Vance's wife is given secret role in Trump's deal-making inner circle: 'I'll have Usha look at it' The Biden blunder that allowed an alleged October 7 'monster' to become a restaurant worker in Louisiana How I reversed my hair loss and lost 8 stone aged 45 - without weight-loss jabs. Urgent warning over cannabis as UK's top psychiatrist warns it isn't safe for young brains still developing It may seem like a relatively harmless right of passage. But cannabis isn't safe for young brains still developing, the UK's top psychiatrist has warned.


Shocking map reveals where power-hungry data centers could spark next public health disaster in the US

Daily Mail - Science & tech

Entitled son, 21, of top lawyer mows down police with his Mercedes G-Wagen...as he smiles in his mugshot Tupac's humiliating intimate disfigurement revealed... and how his lies to cover it up led to his murder Trump'humiliates' speaker Mike Johnson in private conversation as government shutdown rumbles on'I'm Madeline - and this is what I have to say to Lily Allen': Read world exclusive reveal of mother who had affair with star's husband David Harbour, how it started and how she feels about THOSE texts being exposed Loved up Katy Perry holds hands with Justin Trudeau as they officially confirm romance while celebrating the singer's birthday in Paris Furrow-browed boyfriend'strangled girlfriend and set her house on fire while newborn baby was inside' I've uncovered my husband's filthy Viagra habit: But, warns DEAR JANE, one thing YOU are doing is making it so much worse I've started having heart palpitations. Jackie Kennedy's revenge romance with American political icon: Revealed for first time in titillating love letters, the man who helped her cope with JFK's cheating The night that haunted a Wisconsin town forever... and the little girl whose trick-or-treat next door ended in horror Why going gray may save you from CANCER... as scientists make bombshell breakthrough Brazen demands for flying private REVEALED by the woman paid to fulfill them: 'Answer is always yes' They sneered at Trump's'eagle graveyards' - but now Biden's hated windmills crippling an American legend are haunting the US military Kim Kardashian's just been caught in a despicable lie. She can cry all she wants... there's no hiding the truth now: CAROLINE BULLOCK Tua Tagovailoa's swollen eye sparks concern after Dolphins QB woke up with mystery illness on day of Falcons game JD Vance's wife is given secret role in Trump's deal-making inner circle: 'I'll have Usha look at it' The Biden blunder that allowed an alleged October 7 'monster' to become a restaurant worker in Louisiana How I reversed my hair loss and lost 8 stone aged 45 - without weight-loss jabs. A growing network of at least 5,000 data centers across the US is becoming a hidden public health threat, scientists have warned. That is because the energy-hungry backbone of artificial intelligence pumps out dangerous pollutants that can cause asthma, cancer and even death.


Endangered North Atlantic right whales are making a slow comeback

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. The North Atlantic right whale () is one of the most endangered large whales. Their very name references their devastating decline--they were the "right" whales for whalers to target, since the animals floated after being killed. Today, their biggest threats are ship collisions and getting tangled in fishing gear. Estimates for North Atlantic right whale populations are slowly increasing, according to a New England Aquarium statement .