sheffield
Fast, memory-efficient genomic interval tokenizers for modern machine learning
LeRoy, Nathan J., Campbell, Donald R. Jr, Stadick, Seth, Khoroshevskyi, Oleksandr, Park, Sang-Hoon, Hu, Ziyang, Sheffield, Nathan C.
Introduction: Epigenomic datasets from high-throughput sequencing experiments are commonly summarized as genomic intervals. As the volume of this data grows, so does interest in analyzing it through deep learning. However, the heterogeneity of genomic interval data, where each dataset defines its own regions, creates barriers for machine learning methods that require consistent, discrete vocabularies. Methods: We introduce gtars-tokenizers, a high-performance library that maps genomic intervals to a predefined universe or vocabulary of regions, analogous to text tokenization in natural language processing. Built in Rust with bindings for Python, R, CLI, and WebAssembly, gtars-tokenizers implements two overlap methods (BITS and AIList) and integrates seamlessly with modern ML frameworks through Hugging Face-compatible APIs. Results: The gtars-tokenizers package achieves top efficiency for large-scale datasets, while enabling genomic intervals to be processed using standard ML workflows in PyTorch and TensorFlow without ad hoc preprocessing. This token-based approach bridges genomics and machine learning, supporting scalable and standardized analysis of interval data across diverse computational environments. Availability: PyPI and GitHub: https://github.com/databio/gtars.
- North America > United States > Virginia > Albemarle County > Charlottesville (0.06)
- North America > United States > California (0.04)
Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation
We propose a workflow for speech emotion recognition (SER) that combines pre-trained representations with automated hyperparameter optimisation (HPO). Using SpeechBrain wav2vec2-base model fine-tuned on IEMOCAP as the encoder, we compare two HPO strategies, Gaussian Process Bayesian Optimisation (GP-BO) and Tree-structured Parzen Estimators (TPE), under an identical four-dimensional search space and 15-trial budget, with balanced class accuracy (BCA) on the German EmoDB corpus as the objective. All experiments run on 8 CPU cores with 32 GB RAM. GP-BO achieves 0.96 BCA in 11 minutes, and TPE (Hyperopt implementation) attains 0.97 in 15 minutes. In contrast, grid search requires 143 trials and 1,680 minutes to exceed 0.9 BCA, and the best AutoSpeech 2020 baseline reports only 0.85 in 30 minutes on GPU. For cross-lingual generalisation, an EmoDB-trained HPO-tuned model improves zero-shot accuracy by 0.25 on CREMA-D and 0.26 on RAVDESS. Results show that efficient HPO with pre-trained encoders delivers competitive SER on commodity CPUs. Source code to this work is available at: https://github.com/youngaryan/speechbrain-emotion-hpo.
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.90)
- Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.73)
- Information Technology > Artificial Intelligence > Natural Language (0.68)
Explainable Deep Anomaly Detection with Sequential Hypothesis Testing for Robotic Sewer Inspection
George, Alex, Shepherd, Will, Tait, Simon, Mihaylova, Lyudmila, Anderson, Sean R.
Sewer pipe faults, such as leaks and blockages, can lead to severe consequences including groundwater contamination, property damage, and service disruption. Traditional inspection methods rely heavily on the manual review of CCTV footage collected by mobile robots, which is inefficient and susceptible to human error. To automate this process, we propose a novel system incorporating explainable deep learning anomaly detection combined with sequential probability ratio testing (SPRT). The anomaly detector processes single image frames, providing interpretable spatial localisation of anomalies, whilst the SPRT introduces temporal evidence aggregation, enhancing robustness against noise over sequences of image frames. Experimental results demonstrate improved anomaly detection performance, highlighting the benefits of the combined spatiotemporal analysis system for reliable and robust sewer inspection.
- Europe > United Kingdom > England > South Yorkshire > Sheffield (0.05)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Mini robots detect and fix water pipe leaks without digging
Uber Eats uses four-wheeled robots to handle the final stretch of food delivery. Fixing underground water pipes usually means digging up roads and sidewalks -- a process that's disruptive and expensive. However, researchers at the University of Sheffield in the U.K. are working on a different approach. They've developed small robots called "Pipebots" that can travel inside water pipes to find and potentially repair leaks, all without any excavation. Sign up for my FREE CyberGuy Report Get my best tech tips, urgent security alerts, and exclusive deals delivered straight to your inbox.
Overcoming Overfitting in Reinforcement Learning via Gaussian Process Diffusion Policy
Horprasert, Amornyos, Apriaskar, Esa, Liu, Xingyu, Su, Lanlan, Mihaylova, Lyudmila S.
One of the key challenges that Reinforcement Learning (RL) faces is its limited capability to adapt to a change of data distribution caused by uncertainties. This challenge arises especially in RL systems using deep neural networks as decision makers or policies, which are prone to overfitting after prolonged training on fixed environments. To address this challenge, this paper proposes Gaussian Process Diffusion Policy (GPDP), a new algorithm that integrates diffusion models and Gaussian Process Regression (GPR) to represent the policy. GPR guides diffusion models to generate actions that maximize learned Q-function, resembling the policy improvement in RL. Furthermore, the kernel-based nature of GPR enhances the policy's exploration efficiency under distribution shifts at test time, increasing the chance of discovering new behaviors and mitigating overfitting. Simulation results on the Walker2d benchmark show that our approach outperforms state-of-the-art algorithms under distribution shift condition by achieving around 67.74% to 123.18% improvement in the RL's objective function while maintaining comparable performance under normal conditions.
- Asia > Indonesia > Java > Central Java > Semarang (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
- Asia > Thailand (0.04)
Liverpool is crypto capital of UK, survey finds
The city's most famous sons may have sung that money can't buy you love, but that was before bitcoin existed. Liverpool has emerged as the crypto capital of the UK, according to a study looking at the online habits of people across the country. The survey, conducted by telecommunications company Openreach, found that 13% of respondents from Liverpool regularly invest in cryptocurrency and check stocks, more than anywhere else in Britain. Different cities across the UK proved to be hotspots for various activities. London seems to be the online dating capital of Britain, with 24% of respondents saying they engage with dating apps on at least three days a week.
- Telecommunications (1.00)
- Banking & Finance > Trading (1.00)
Multimodal Lead-Specific Modeling of ECG for Low-Cost Pulmonary Hypertension Assessment
Suvon, Mohammod N. I., Zhou, Shuo, Tripathi, Prasun C., Fan, Wenrui, Alabed, Samer, Khanal, Bishesh, Osmani, Venet, Swift, Andrew J., Chen, null, Chen, null, Lu, Haiping
Pulmonary hypertension (PH) is frequently underdiagnosed in low- and middle-income countries (LMICs) primarily due to the scarcity of advanced diagnostic tools. Several studies in PH have applied machine learning to low-cost diagnostic tools like 12-lead ECG (12L-ECG), but they mainly focus on areas with limited resources, overlooking areas with no diagnostic tools, such as rural primary healthcare in LMICs. Recent studies have shown the effectiveness of 6-lead ECG (6L-ECG), as a cheaper and portable alternative in detecting various cardiac conditions, but its clinical value for PH detection is not well proved. Furthermore, existing methods treat 12L-/6L-ECG as a single modality, capturing only shared features while overlooking lead-specific features essential for identifying complex cardiac hemodynamic changes. In this paper, we propose Lead-Specific Electrocardiogram Multimodal Variational Autoencoder (LS-EMVAE), a model pre-trained on large-population 12L-ECG data and fine-tuned on task-specific data (12L-ECG or 6L-ECG). LS-EMVAE models each 12L-ECG lead as a separate modality and introduces a hierarchical expert composition using Mixture and Product of Experts for adaptive latent feature fusion between lead-specific and shared features. Unlike existing approaches, LS-EMVAE makes better predictions on both 12L-ECG and 6L-ECG at inference, making it an equitable solution for areas with limited or no diagnostic tools. We pre-trained LS-EMVAE on 800,000 publicly available 12L-ECG samples and fine-tuned it for two tasks: 1) PH detection and 2) phenotyping pre-/post-capillary PH, on in-house datasets of 892 and 691 subjects across 12L-ECG and 6L-ECG settings. Extensive experiments show that LS-EMVAE outperforms existing baselines in both ECG settings, while 6L-ECG achieves performance comparable to 12L-ECG, unlocking its potential for global PH screening in areas without diagnostic tools.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Asia > India (0.14)
Foundation-Model-Boosted Multimodal Learning for fMRI-based Neuropathic Pain Drug Response Prediction
Fan, Wenrui, Rizky, L. M. Riza, Zhang, Jiayang, Chen, Chen, Lu, Haiping, Teh, Kevin, Selvarajah, Dinesh, Zhou, Shuo
Neuropathic pain, affecting up to 10% of adults, remains difficult to treat due to limited therapeutic efficacy and tolerability. Although resting-state functional MRI (rs-fMRI) is a promising non-invasive measurement of brain biomarkers to predict drug response in therapeutic development, the complexity of fMRI demands machine learning models with substantial capacity. However, extreme data scarcity in neuropathic pain research limits the application of high-capacity models. To address the challenge of data scarcity, we propose FMM$_{TC}$, a Foundation-Model-boosted Multimodal learning framework for fMRI-based neuropathic pain drug response prediction, which leverages both internal multimodal information in pain-specific data and external knowledge from large pain-agnostic data. Specifically, to maximize the value of limited pain-specific data, FMM$_{TC}$ integrates complementary information from two rs-fMRI modalities: Time series and functional Connectivity. FMM$_{TC}$ is further boosted by an fMRI foundation model with its external knowledge from extensive pain-agnostic fMRI datasets enriching limited pain-specific information. Evaluations with an in-house dataset and a public dataset from OpenNeuro demonstrate FMM$_{TC}$'s superior representation ability, generalizability, and cross-dataset adaptability over existing unimodal fMRI models that only consider one of the rs-fMRI modalities. The ablation study validates the effectiveness of multimodal learning and foundation-model-powered external knowledge transfer in FMM$_{TC}$. An integrated gradient-based interpretation study explains how FMM$_{TC}$'s cross-dataset dynamic behaviors enhance its adaptability. In conclusion, FMM$_{TC}$ boosts clinical trials in neuropathic pain therapeutic development by accurately predicting drug responses to improve the participant stratification efficiency.
- Europe > United Kingdom > England > South Yorkshire > Sheffield (0.05)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)
Exploring the Feasibility of Deep Learning Models for Long-term Disease Prediction: A Case Study for Wheat Yellow Rust in England
Yuan, Zhipeng, Zhang, Yu, Bi, Gaoshan, Yang, Po
Wheat yellow rust, caused by the fungus Puccinia striiformis, is a critical disease affecting wheat crops across Britain, leading to significant yield losses and economic consequences. Given the rapid environmental changes and the evolving virulence of pathogens, there is a growing need for innovative approaches to predict and manage such diseases over the long term. This study explores the feasibility of using deep learning models to predict outbreaks of wheat yellow rust in British fields, offering a proactive approach to disease management. We construct a yellow rust dataset with historial weather information and disease indicator acrossing multiple regions in England. We employ two poweful deep learning models, including fully connected neural networks and long short-term memory to develop predictive models capable of recognizing patterns and predicting future disease outbreaks.The models are trained and validated in a randomly sliced datasets. The performance of these models with different predictive time steps are evaluated based on their accuracy, precision, recall, and F1-score. Preliminary results indicate that deep learning models can effectively capture the complex interactions between multiple factors influencing disease dynamics, demonstrating a promising capacity to forecast wheat yellow rust with considerable accuracy. Specifically, the fully-connected neural network achieved 83.65% accuracy in a disease prediction task with 6 month predictive time step setup. These findings highlight the potential of deep learning to transform disease management strategies, enabling earlier and more precise interventions. Our study provides a methodological framework for employing deep learning in agricultural settings but also opens avenues for future research to enhance the robustness and applicability of predictive models in combating crop diseases globally.
- Europe > United Kingdom > England > South Yorkshire > Sheffield (0.05)
- North America > United States (0.04)
- Research Report > Promising Solution (0.48)
- Research Report > Experimental Study (0.34)
- Overview > Innovation (0.34)
- Health & Medicine (1.00)
- Food & Agriculture > Agriculture (1.00)
What is your hometown known for? Interactive map reveals the unexpected UK towns and villages where world-famous gadgets were invented - from the TV to the toothbrush
There's no doubt Great Britain lays claim to some of the greatest scientific discoveries and inventions that have changed the face of modern society. Now, MailOnline's interactive map reveals the birthplace of 30 of these famous British marvels, from stainless steel to the jet engine and the electric motor. Who can forget Alan Turing's Bombe machine, used to break Enigma-enciphered messages about enemy military operations during WWII? Turing developed the Bombe in 1939 at Bletchley Park in Buckinghamshire and hundreds were built, marking a crucial contribution to the war effort. Also on the map is the hovercraft invented by Christopher Cockerell in 1955 and first launched four years later on the the Isle of Wight.
- Europe > United Kingdom > England > Buckinghamshire > Milton Keynes (0.25)
- Europe > United Kingdom > England > Isle of Wight (0.25)
- Europe > Germany (0.06)
- (10 more...)
- Health & Medicine (1.00)
- Government > Military (1.00)
- Materials > Metals & Mining > Steel (0.69)
- Information Technology > Security & Privacy (0.55)
- Information Technology > Artificial Intelligence > History (0.55)