AITopics

Loop closure detection is important for simultaneous localization and mapping (SLAM), which associates current observations with historical keyframes, achieving drift correction and global relocalization. However, a falsely detected loop can be fatal, and this is especially difficult in repetitive environments where appearance-based features fail due to the high similarity. Therefore, verification of a loop closure is a critical step in avoiding false positive detections. Existing works in loop closure verification predominantly focus on learning invariant appearance features, neglecting the prior knowledge of the robot's spatial-temporal motion cue, i.e., trajectory. In this letter, we propose ROVER, a loop closure verification method that leverages the historical trajectory as a prior constraint to reject false loops in challenging repetitive environments. For each loop candidate, it is first used to estimate the robot trajectory with pose-graph optimization. This trajectory is then submitted to a scoring scheme that assesses its compliance with the trajectory without the loop, which we refer to as the trajectory prior, to determine if the loop candidate should be accepted. Benchmark comparisons and real-world experiments demonstrate the effectiveness of the proposed method. Furthermore, we integrate ROVER into state-of-the-art SLAM systems to verify its robustness and efficiency. Our source code and self-collected dataset are available at https://github.com/jarvisyjw/ROVER.

artificial intelligence, machine learning, natural language, (20 more...)

2508.13488

Country:

Asia > China (0.28)
Asia > Japan (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.35)

Hierarchy-Consistent Learning and Adaptive Loss Balancing for Hierarchical Multi-Label Classification

Jiang, Ruobing, Liu, Mengzhe, Liu, Haobing, Yu, Yanwei

Hierarchical Multi-Label Classification (HMC) faces critical challenges in maintaining structural consistency and balancing loss weighting in Multi-Task Learning (MTL). In order to address these issues, we propose a classifier called HCAL based on MTL integrated with prototype contrastive learning and adaptive task-weighting mechanisms. The most significant advantage of our classifier is semantic consistency including both prototype with explicitly modeling label and feature aggregation from child classes to parent classes. The other important advantage is an adaptive loss-weighting mechanism that dynamically allocates optimization resources by monitoring task-specific convergence rates. It effectively resolves the "one-strong-many-weak" optimization bias inherent in traditional MTL approaches. To further enhance robustness, a prototype perturbation mechanism is formulated by injecting controlled noise into prototype to expand decision boundaries. Additionally, we formalize a quantitative metric called Hierarchical Violation Rate (HVR) as to evaluate hierarchical consistency and generalization. Extensive experiments across three datasets demonstrate both the higher classification accuracy and reduced hierarchical violation rate of the proposed classifier over baseline models.

artificial intelligence, machine learning, prototype, (17 more...)

doi: 10.1145/3746252.3761241

2508.13452

Country:

North America > United States (0.28)
Asia > China > Shandong Province (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Whispering Context: Distilling Syntax and Semantics for Long Speech Transcripts

Altinok, Duygu

ASR systems often struggle with maintaining syntactic and semantic accuracy in long audio transcripts, impacting tasks like Named Entity Recognition (NER), capitalization, and punctuation. We propose a novel approach that enhances ASR by distilling contextual knowledge from LLaMA models into Whisper. Our method uses two strategies: (1) token level distillation with optimal transport to align dimensions and sequence lengths, and (2) representation loss minimization between sentence embeddings of Whisper and LLaMA, blending syntax and semantics. Evaluations on the Spoken Wikipedia dataset, a benchmark with long audios and rich entities demonstrate significant improvements in Word Error Rate (WER), NER, capitalization, and punctuation success. By introducing novel NER metrics and exploring semantics aware ASR, our work highlights the value of integrating linguistic context into transcription, setting a foundation for robust, context-aware ASR in longform speech.

distillation, machine learning, natural language, (20 more...)

2508.13376

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Law (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Vehicle detection from GSV imagery: Predicting travel behaviour for cycling and motorcycling using Computer Vision

Kyriaki, null, Kokka, null, Goel, Rahul, Abbas, Ali, Nice, Kerry A., Martial, Luca, Labib, SM, Ke, Rihuan, Schönlieb, Carola Bibiane, Woodcock, James

Transportation influence health by shaping exposure to physical activity, air pollution and injury risk. Comparative data on cycling and motorcycling behaviours is scarce, particularly at a global scale. Street view imagery, such as Google Street View (GSV), combined with computer vision, is a valuable resource for efficiently capturing travel behaviour data. This study demonstrates a novel approach using deep learning on street view images to estimate cycling and motorcycling levels across diverse cities worldwide. We utilized data from 185 global cities. The data on mode shares of cycling and motorcycling estimated using travel surveys or censuses. We used GSV images to detect cycles and motorcycles in sampled locations, using 8000 images per city. The YOLOv4 model, fine-tuned using images from six cities, achieved a mean average precision of 89% for detecting cycles and motorcycles. A global prediction model was developed using beta regression with city-level mode shares as outcome, with log transformed explanatory variables of counts of GSV-detected images with cycles and motorcycles, while controlling for population density. We found strong correlations between GSV motorcycle counts and motorcycle mode share (0.78) and moderate correlations between GSV cycle counts and cycling mode share (0.51). Beta regression models predicted mode shares with $R^2$ values of 0.614 for cycling and 0.612 for motorcycling, achieving median absolute errors (MDAE) of 1.3% and 1.4%, respectively. Scatterplots demonstrated consistent prediction accuracy, though cities like Utrecht and Cali were outliers. The model was applied to 60 cities globally for which we didn't have recent mode share data. We provided estimates for some cities in the Middle East, Latin America and East Asia. With computer vision, GSV images capture travel modes and activity, providing insights alongside traditional data sources.

artificial intelligence, deep learning, machine learning, (19 more...)

2508.12794

Country:

South America (1.00)
North America (1.00)
Africa (1.00)
(2 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.48)

Azam, Md Basit, Singh, Sarangthem Ibotombi

Age-Normalized HRV Features for Non-Invasive Glucose Prediction: A Pilot Sleep-Aware Machine Learning Study

Non-invasive glucose monitoring remains a critical challenge in the management of diabetes. HRV during sleep shows promise for glucose prediction however, age-related autonomic changes significantly confound traditional HRV analyses. We analyzed 43 subjects with multi-modal data including sleep-stage specific ECG, HRV features, and clinical measurements. A novel age-normalization technique was applied to the HRV features by, dividing the raw values by age-scaled factors. BayesianRidge regression with 5-fold cross-validation was employed for log-glucose prediction. Age-normalized HRV features achieved R2 = 0.161 (MAE = 0.182) for log-glucose prediction, representing a 25.6% improvement over non-normalized features (R2 = 0.132). The top predictive features were hrv rem mean rr age normalized (r = 0.443, p = 0.004), hrv ds mean rr age normalized (r = 0.438, p = 0.005), and diastolic blood pressure (r = 0.437, p = 0.005). Systematic ablation studies confirmed age-normalization as the critical component, with sleep-stage specific features providing additional predictive value. Age-normalized HRV features significantly enhance glucose prediction accuracy compared with traditional approaches. This sleep-aware methodology addresses fundamental limitations in autonomic function assessment and suggests a preliminary feasibility for non-invasive glucose monitoring applications. However, these results require validation in larger cohorts before clinical consideration.

artificial intelligence, hrv feature, machine learning, (11 more...)

2508.11682

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Sleep (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.50)

Hassan, Kazi Mahmudul, Zhao, Xuyang, Sugano, Hidenori, Tanaka, Toshihisa

MR-EEGWaveNet: Multiresolutional EEGWaveNet for Seizure Detection from Long EEG Recordings

Feature engineering for generalized seizure detection models remains a significant challenge. Recently proposed models show variable performance depending on the training data and remain ineffective at accurately distinguishing artifacts from seizure data. In this study, we propose a novel end-to-end model, "Multiresolutional EEGWaveNet (MR-EEGWaveNet)," which efficiently distinguishes seizure events from background electroencephalogram (EEG) and artifacts/noise by capturing both temporal dependencies across different time frames and spatial relationships between channels. The model has three modules: convolution, feature extraction, and predictor. The convolution module extracts features through depth-wise and spatio-temporal convolution. The feature extraction module individually reduces the feature dimension extracted from EEG segments and their sub-segments. Subsequently, the extracted features are concatenated into a single vector for classification using a fully connected classifier called the predictor module. In addition, an anomaly score-based post-classification processing technique is introduced to reduce the false-positive rates of the model. Experimental results are reported and analyzed using different parameter settings and datasets (Siena (public) and Juntendo (private)). The proposed MR-EEGWaveNet significantly outperformed the conventional non-multiresolution approach, improving the F1 scores from 0.177 to 0.336 on Siena and 0.327 to 0.488 on Juntendo, with precision gains of 15.9% and 20.62%, respectively.

artificial intelligence, data mining, machine learning, (18 more...)

2505.17972

Country: Asia > Japan > Honshū > Kantō (0.46)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Epilepsy (0.69)
Health & Medicine > Therapeutic Area > Genetic Disease (0.69)
Health & Medicine > Health Care Technology (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Wei, Changshuai, Lu, Qing

Collapsing ROC approach for risk prediction research on both common and rare variants

Risk prediction that capitalizes on emerging genetic findings holds great promise for improving public health and clinical care. However, recent risk prediction research has shown that predictive tests formed on existing common genetic loci, including those from genome-wide association studies, have lacked sufficient accuracy for clinical use. Because most rare variants on the genome have not yet been studied for their role in risk prediction, future disease prediction discoveries should shift toward a more comprehensive risk prediction strategy that takes into account both common and rare variants. We are proposing a collapsing receiver operating characteristic CROC approach for risk prediction research on both common and rare variants. The new approach is an extension of a previously developed forward ROC FROC approach, with additional procedures for handling rare variants. The approach was evaluated through the use of 533 single-nucleotide polymorphisms SNPs in 37 candidate genes from the Genetic Analysis Workshop 17 mini-exome data set. We found that a prediction model built on all SNPs gained more accuracy AUC = 0.605 than one built on common variants alone AUC = 0.585. We further evaluated the performance of two approaches by gradually reducing the number of common variants in the analysis. We found that the CROC method attained more accuracy than the FROC method when the number of common variants in the data decreased. In an extreme scenario, when there are only rare variants in the data, the CROC reached an AUC value of 0.603, whereas the FROC had an AUC value of 0.524.

artificial intelligence, machine learning, variant, (14 more...)

doi: 10.1186/1753-6561-5-S9-S42

2508.13552

Country: North America > United States > Michigan > Ingham County (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Bahador, Nooshin, Lankarany, Milad

Classifying Clinical Outcome of Epilepsy Patients with Ictal Chirp Embeddings

This study presents a pipeline leveraging t-Distributed Stochastic Neighbor Embedding (t-SNE) for interpretable visualizations of chirp features across diverse outcome scenarios. The dataset, comprising chirp-based temporal, spectral, and frequency metrics. Using t-SNE, local neighborhood relationships were preserved while addressing the crowding problem through Student t-distribution-based similarity optimization. Three classification tasks were formulated on the 2D t-SNE embeddings: (1) distinguishing clinical success from failure/no-resection, (2) separating high-difficulty from low-difficulty cases, and (3) identifying optimal cases, defined as successful outcomes with minimal clinical difficulty. Four classifiers, namely, Random Forests, Support Vector Machines, Logistic Regression, and k-Nearest Neighbors, were trained and evaluated using stratified 5-fold cross-validation. Across tasks, the Random Forest and k-NN classifiers demonstrated superior performance, achieving up to 88.8% accuracy in optimal case detection (successful outcomes with minimal clinical difficulty). Additionally, feature influence sensitivity maps were generated using SHAP explanations applied to model predicting t-SNE coordinates, revealing spatially localized feature importance within the embedding space. These maps highlighted how specific chirp attributes drive regional clustering and class separation, offering insights into the latent structure of the data. The integrated framework showcases the potential of interpretable embeddings and local feature attribution for clinical stratification and decision support.

artificial intelligence, logistic regression, machine learning, (17 more...)

2508.13476

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Epilepsy (1.00)
Health & Medicine > Therapeutic Area > Genetic Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Ozdemir, Zeynep, Keles, Hacer Yalim, Tanriover, Omer Ozgur

CLoE: Curriculum Learning on Endoscopic Images for Robust MES Classification

Estimating disease severity from endoscopic images is essential in assessing ulcerative colitis, where the Mayo Endoscopic Subscore (MES) is widely used to grade inflammation. However, MES classification remains challenging due to label noise from inter-observer variability and the ordinal nature of the score, which standard models often ignore. We propose CLoE, a curriculum learning framework that accounts for both label reliability and ordinal structure. Image quality, estimated via a lightweight model trained on Boston Bowel Preparation Scale (BBPS) labels, is used as a proxy for annotation confidence to order samples from easy (clean) to hard (noisy). This curriculum is further combined with ResizeMix augmentation to improve robustness. Experiments on the LIMUC and HyperKvasir datasets, using both CNNs and Transformers, show that CLoE consistently improves performance over strong supervised and self-supervised baselines. For instance, ConvNeXt-Tiny reaches 82.5\% accuracy and a QWK of 0.894 on LIMUC with low computational cost. These results highlight the potential of difficulty-aware training strategies for improving ordinal classification under label uncertainty. Code will be released at https://github.com/zeynepozdemir/CLoE.

artificial intelligence, classification, machine learning, (16 more...)

2508.1328

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Neural Information Processing SystemsAug-19-2025, 23:56:46 GMT

Inherent Tradeoffs in Learning Fair Representations

Han Zhao, Geoff Gordon

In fact, a line of work has proposed to learn group-invariant representations with adversarial learning techniques in order to achieve statistical parity, also known as the demographic parity in the literature.

demographic parity, parity, representation, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.69)

Industry:

Information Technology (0.68)
Law (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)