AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-16-2026, 15:53:45 GMT

b3640c2d3e58f716c67066046318db0f-Paper-Datasets_and_Benchmarks.pdf

artificial intelligence, machine learning, occlusion, (16 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Illinois > Champaign County > Champaign (0.04)
Europe > Greece > Attica > Athens (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.67)

Industry:

Government (0.93)
Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Neural Information Processing SystemsFeb-16-2026, 12:33:26 GMT

Y ouTubePD: A Multimodal Benchmark for Parkinson's Disease Analysis Supplementary Material

We include all our annotations and extracted landmarks. This ensures that we uphold the highest standards of ethical data usage. In Table A1, we summarize the severity label distribution in Y ouTubePD. We also summarize the demographic distribution in Y ouTubePD, split between PD-positive and healthy control (HC), or PD-negative, subjects. This decision is based on the clinician's suggestion, since an accurate UPDRS facial expression rating would require more This strategy also allows for a finer classification.

artificial intelligence, machine learning, public figure, (17 more...)

Country:

North America > Canada (0.05)
North America > Mexico (0.04)
Europe > United Kingdom (0.04)
(15 more...)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Neural Information Processing SystemsFeb-10-2026, 20:36:58 GMT

944ecf65a46feb578a43abfd5cddd960-Supplemental-Conference.pdf

artificial intelligence, epoch, machine learning, (17 more...)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.37)

Neural Information Processing SystemsFeb-8-2026, 09:06:09 GMT

Supplement: SingleModelUncertaintyEstimationvia StochasticDataCentering APPENDIX

For demonstration, let us consider the1D regression example showedinFigure 1andtrain UQmodels under different trainsample sizes(5,10,50 and 200 respectively). The figure illustrates the predicted function and the associated uncertainty estimates (shaded region around thepredictions). ''' model: network trained with anchoring anchors: set of randomly chosen anchors (ideally from train dist.) Foreach case, we showthe negative log-likelihood for the test data obtained using each of the methods. Note, all metrics were computed as an average from20 random trials of0.8 0.2 train-test split.

anchor, artificial intelligence, machine learning, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Phukon, Bornali, Zheng, Xiuwen, Hasegawa-Johnson, Mark

Aligning ASR Evaluation with Human and LLM Judgments: Intelligibility Metrics Using Phonetic, Semantic, and NLI Approaches

arXiv.org Artificial IntelligenceDec-12-2025

Traditional ASR metrics like WER and CER fail to capture intelligibility, especially for dysarthric and dysphonic speech, where semantic alignment matters more than exact word matches. ASR systems struggle with these speech types, often producing errors like phoneme repetitions and imprecise consonants, yet the meaning remains clear to human listeners. We identify two key challenges: (1) Existing metrics do not adequately reflect intelligibility, and (2) while LLMs can refine ASR output, their effectiveness in correcting ASR transcripts of dysarthric speech remains underexplored. To address this, we propose a novel metric integrating Natural Language Inference (NLI) scores, semantic similarity, and phonetic similarity. Our ASR evaluation metric achieves a 0.890 correlation with human judgments on Speech Accessibility Project data, surpassing traditional methods and emphasizing the need to prioritize intelligibility over error-based measures.

large language model, machine learning, natural language, (19 more...)

2506.16528

Country: North America > United States > Illinois (0.16)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

arXiv.org Artificial IntelligenceDec-4-2025

VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety

Palaskar, Shruti, Gatys, Leon, Abdelrahman, Mona, Jacobo, Mar, Lindsey, Larry, Moharir, Rutika, Lund, Gunnar, Xu, Yang, Shiee, Navid, Bigham, Jeffrey, Maalouf, Charles, Cheng, Joseph Yitan

Safety evaluation of multimodal foundation models often treats vision and language inputs separately, missing risks from joint interpretation where benign content becomes harmful in combination. Existing approaches also fail to distinguish clearly unsafe content from borderline cases, leading to problematic over-blocking or under-refusal of genuinely harmful content. We present Vision Language Safety Understanding (VLSU), a comprehensive framework to systematically evaluate multimodal safety through fine-grained severity classification and combinatorial analysis across 17 distinct safety patterns. Using a multi-stage pipeline with real-world images and human annotation, we construct a large-scale benchmark of 8,187 samples spanning 15 harm categories. Our evaluation of eleven state-of-the-art models reveals systematic joint understanding failures: while models achieve 90%-plus accuracy on clear unimodal safety signals, performance degrades substantially to 20-55% when joint image-text reasoning is required to determine the safety label. Most critically, 34% of errors in joint image-text safety classification occur despite correct classification of the individual modalities, further demonstrating absent compositional reasoning capabilities. Additionally, we find that models struggle to balance refusing unsafe content while still responding to borderline cases that deserve engagement. For example, we find that instruction framing can reduce the over-blocking rate on borderline content from 62.4% to 10.4% in Gemini-1.5, but only at the cost of under-refusing on unsafe content with refusal rate dropping from 90.8% to 53.9%. Overall, our framework exposes weaknesses in joint image-text understanding and alignment gaps in current models, and provides a critical test bed to enable the next milestones in research on robust vision-language safety.

category, large language model, machine learning, (21 more...)

2510.18214

Country:

North America > United States (0.46)
North America > Mexico (0.28)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Health & Medicine (0.70)
Government (0.67)
Law Enforcement & Public Safety > Terrorism (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Singh, Rishu Kumar, Shreya, Navneet, Das, Sarmistha, Singh, Apoorva, Saha, Sriparna

Talk, Snap, Complain: Validation-Aware Multimodal Expert Framework for Fine-Grained Customer Grievances

arXiv.org Artificial IntelligenceNov-19-2025

Existing approaches to complaint analysis largely rely on unimodal, short-form content such as tweets or product reviews. This work advances the field by leveraging multimodal, multi-turn customer support dialogues, where users often share both textual complaints and visual evidence (e.g., screenshots, product photos) to enable fine-grained classification of complaint aspects and severity. We introduce VALOR, a Validation-Aware Learner with Expert Routing, tailored for this multimodal setting. It employs a multi-expert reasoning setup using large-scale generative models with Chain-of-Thought (CoT) prompting for nuanced decision-making. To ensure coherence between modalities, a semantic alignment score is computed and integrated into the final classification through a meta-fusion strategy. In alignment with the United Nations Sustainable Development Goals (UN SDGs), the proposed framework supports SDG 9 (Industry, Innovation and Infrastructure) by advancing AI-driven tools for robust, scalable, and context-aware service infrastructure. Further, by enabling structured analysis of complaint narratives and visual context, it contributes to SDG 12 (Responsible Consumption and Production) by promoting more responsive product design and improved accountability in consumer services. We evaluate VALOR on a curated multimodal complaint dataset annotated with fine-grained aspect and severity labels, showing that it consistently outperforms baseline models, especially in complex complaint scenarios where information is distributed across text and images. This study underscores the value of multimodal interaction and expert validation in practical complaint understanding systems. Resources related to data and codes are available here: https://github.com/sarmistha-D/VALOR

large language model, machine learning, natural language, (22 more...)

2511.14693

Country:

Europe > Austria (0.28)
North America > United States (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Communications > Social Media (0.94)
(2 more...)

arXiv.org Artificial IntelligenceNov-6-2025

Test Time Adaptation Using Adaptive Quantile Recalibration

Mehrbod, Paria, Vianna, Pedro, Nanfack, Geraldin, Wolf, Guy, Belilovsky, Eugene

Domain adaptation is a key strategy for enhancing the generalizability of deep learning models in real-world scenarios, where test distributions often diverge significantly from the training domain. However, conventional approaches typically rely on prior knowledge of the target domain or require model retraining, limiting their practicality in dynamic or resource-constrained environments. Recent test-time adaptation methods based on batch normalization statistic updates allow for unsupervised adaptation, but they often fail to capture complex activation distributions and are constrained to specific normalization layers. We propose Adaptive Quantile Recalibration (AQR), a test-time adaptation technique that modifies pre-activation distributions by aligning quantiles on a channel-wise basis. AQR captures the full shape of activation distributions and generalizes across architectures employing BatchNorm, GroupNorm, or LayerNorm. To address the challenge of estimating distribution tails under varying batch sizes, AQR incorporates a robust tail calibration strategy that improves stability and precision. Our method leverages source-domain statistics computed at training time, enabling unsupervised adaptation without retraining models. Experiments on CIFAR-10-C, CIFAR-100-C, and ImageNet-C across multiple architectures demonstrate that AQR achieves robust adaptation across diverse settings, outperforming existing test-time adaptation baselines. These results highlight AQR's potential for deployment in real-world scenarios with dynamic and unpredictable data distributions.

artificial intelligence, machine learning, percentile, (19 more...)

2511.03148

Country: North America > Canada (0.28)

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)