Qinghai Province
MUCH: A Multilingual Claim Hallucination Benchmark
Dentan, Jérémie, Canesse, Alexi, Buscaldi, Davide, Shabou, Aymen, Vanier, Sonia
Claim-level Uncertainty Quantification (UQ) is a promising approach to mitigate the lack of reliability in Large Language Models (LLMs). We introduce MUCH, the first claim-level UQ benchmark designed for fair and reproducible evaluation of future methods under realistic conditions. It includes 4,873 samples across four European languages (English, French, Spanish, and German) and four instruction-tuned open-weight LLMs. Unlike prior claim-level benchmarks, we release 24 generation logits per token, facilitating the development of future white-box methods without re-generating data. Moreover, in contrast to previous benchmarks that rely on manual or LLM-based segmentation, we propose a new deterministic algorithm capable of segmenting claims using as little as 0.2% of the LLM generation time. This makes our segmentation approach suitable for real-time monitoring of LLM outputs, ensuring that MUCH evaluates UQ methods under realistic deployment constraints. Finally, our evaluations show that current methods still have substantial room for improvement in both performance and efficiency.
- Europe > Ukraine > Kyiv Oblast > Kyiv (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- (98 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Education > Health & Safety > School Nutrition (0.93)
- Health & Medicine > Consumer Health (0.93)
RAC-DMVC: Reliability-Aware Contrastive Deep Multi-View Clustering under Multi-Source Noise
Dong, Shihao, Liu, Yue, Zhou, Xiaotong, Zheng, Yuhui, Xu, Huiying, Zhu, Xinzhong
Multi-view clustering (MVC), which aims to separate the multi-view data into distinct clusters in an unsupervised manner, is a fundamental yet challenging task. To enhance its applicability in real-world scenarios, this paper addresses a more challenging task: MVC under multi-source noises, including missing noise and observation noise. To this end, we propose a novel framework, Reliability-Aware Contrastive Deep Multi-View Clustering (RAC-DMVC), which constructs a reliability graph to guide robust representation learning under noisy environments. Specifically, to address observation noise, we introduce a cross-view reconstruction to enhances robustness at the data level, and a reliability-aware noise contrastive learning to mitigates bias in positive and negative pairs selection caused by noisy representations. To handle missing noise, we design a dual-attention imputation to capture shared information across views while preserving view-specific features. In addition, a self-supervised cluster distillation module further refines the learned representations and improves the clustering performance. Extensive experiments on five benchmark datasets demonstrate that RAC-DMVC outperforms SOTA methods on multiple evaluation metrics and maintains excellent performance under varying ratios of noise.
- Europe > Austria > Vienna (0.14)
- North America > Canada > British Columbia > Vancouver (0.04)
- Asia > Macao (0.04)
- (12 more...)
- Asia > Middle East > Jordan (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (6 more...)
Context-Aware Dynamic Chunking for Streaming Tibetan Speech Recognition
Wang, Chao, Cai, Yuqing, Duojie, Renzeng, Zhang, Jin, Liu, Yutong, Tashi, Nyima
ABSTRACT In this work, we propose a streaming speech recognition framework for Amdo Tibetan, built upon a hybrid CTC/Atten-tion architecture with a context-aware dynamic chunking mechanism. The proposed strategy adaptively adjusts chunk widths based on encoding states, enabling flexible receptive fields, cross-chunk information exchange, and robust adaptation to varying speaking rates, thereby alleviating the context truncation problem of fixed-chunk methods. To further capture the linguistic characteristics of Tibetan, we construct a lexicon grounded in its orthographic principles, providing linguistically motivated modeling units. During decoding, an external language model is integrated to enhance semantic consistency and improve recognition of long sentences. Experimental results show that the proposed framework achieves a word error rate (WER) of 6.23% on the test set, yielding a 48.15% relative improvement over the fixed-chunk baseline, while significantly reducing recognition latency and maintaining performance close to global decoding.
- Asia > China > Qinghai Province > Xining (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Tibet Autonomous Region > Lhasa (0.04)
- Asia > China > Sichuan Province > Chengdu (0.04)
Large Language Models Do Multi-Label Classification Differently
Ma, Marcus, Chochlakis, Georgios, Pandiyan, Niyantha Maruthu, Thomason, Jesse, Narayanan, Shrikanth
Multi-label classification is prevalent in real-world settings, but the behavior of Large Language Models (LLMs) in this setting is understudied. We investigate how autoregressive LLMs perform multi-label classification, focusing on subjective tasks, by analyzing the output distributions of the models at each label generation step. We find that the initial probability distribution for the first label often does not reflect the eventual final output, even in terms of relative order and find LLMs tend to suppress all but one label at each generation step. We further observe that as model scale increases, their token distributions exhibit lower entropy and higher single-label confidence, but the internal relative ranking of the labels improves. Finetuning methods such as supervised finetuning and reinforcement learning amplify this phenomenon. We introduce the task of distribution alignment for multi-label settings: aligning LLM-derived label distributions with empirical distributions estimated from annotator responses in subjective tasks. We propose both zero-shot and supervised methods which improve both alignment and predictive performance over existing approaches. We find one method -- taking the max probability over all label generation distributions instead of just using the initial probability distribution -- improves both distribution alignment and overall F1 classification without adding any additional computation.
- North America > United States > California (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Singapore (0.04)
- (5 more...)
A Review of End-to-End Precipitation Prediction Using Remote Sensing Data: from Divination to Machine Learning
Precipitation prediction has undergone a profound transformation -- from early symbolic and empirical methods rooted in divination and observation, to modern technologies based on atmospheric physics and artificial intelligence. This review traces the historical and technological evolution of precipitation forecasting, presenting a survey about end-to-end precipitation prediction technologies that spans ancient practices, the foundations of meteorological science, the rise of numerical weather prediction (NWP), and the emergence of machine learning (ML) and deep learning (DL) models. We first explore traditional and indigenous forecasting methods, then describe the development of physical modeling and statistical frameworks that underpin contemporary operational forecasting. Particular emphasis is placed on recent advances in neural network-based approaches, including automated deep learning, interpretability-driven design, and hybrid physical-data models. By compositing research across multiple eras and paradigms, this review not only depicts the history of end-to-end precipitation prediction but also outlines future directions in next generation forecasting systems.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Brazil > São Paulo (0.04)
- (29 more...)
- Overview (1.00)
- Research Report > New Finding (0.67)
- Government > Regional Government > North America Government > United States Government (1.00)
- Energy (1.00)
- Information Technology (0.67)
Long-Term PM2.5 Forecasting Using a DTW-Enhanced CNN-GRU Model
Naeini, Amirali Ataee, Naeini, Arshia Ataee, Mohammadi, Fatemeh Karami, Ghaffarpasand, Omid
Reliable long-term forecasting of PM2.5 concentrations is critical for public health early-warning systems, yet existing deep learning approaches struggle to maintain prediction stability beyond 48 hours, especially in cities with sparse monitoring networks. This paper presents a deep learning framework that combines Dynamic Time Warping (DTW) for intelligent station similarity selection with a CNN-GRU architecture to enable extended-horizon PM2.5 forecasting in Isfahan, Iran, a city characterized by complex pollution dynamics and limited monitoring coverage. Unlike existing approaches that rely on computationally intensive transformer models or external simulation tools, our method integrates three key innovations: (i) DTW-based historical sampling to identify similar pollution patterns across peer stations, (ii) a lightweight CNN-GRU architecture augmented with meteorological features, and (iii) a scalable design optimized for sparse networks. Experimental validation using multi-year hourly data from eight monitoring stations demonstrates superior performance compared to state-of-the-art deep learning methods, achieving R2 = 0.91 for 24-hour forecasts. Notably, this is the first study to demonstrate stable 10-day PM2.5 forecasting (R2 = 0.73 at 240 hours) without performance degradation, addressing critical early-warning system requirements. The framework's computational efficiency and independence from external tools make it particularly suitable for deployment in resource-constrained urban environments.
- Asia > Middle East > Iran > Isfahan Province > Isfahan (0.24)
- North America > United States > California > Yolo County > Davis (0.14)
- Asia > China > Beijing > Beijing (0.05)
- (9 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Public Health (0.66)
Simple and Robust Forecasting of Spatiotemporally Correlated Small Earth Data with A Tabular Foundation Model
Yang, Yuting, Mei, Gang, Ma, Zhengjing, Xu, Nengxiong, Peng, Jianbing
Small Earth data are geoscience observations with limited short-term monitoring variability, providing sparse but meaningful measurements, typically exhibiting spatiotemporal correlations. Spatiotemporal forecasting on such data is crucial for understanding geoscientific processes despite their small scale. However, conventional deep learning models for spatiotemporal forecasting requires task-specific training for different scenarios. Foundation models do not need task-specific training, but they often exhibit forecasting bias toward the global mean of the pretraining distribution. Here we propose a simple and robust approach for spatiotemporally correlated small Earth data forecasting. The essential idea is to characterize and quantify spatiotemporal patterns of small Earth data and then utilize tabular foundation models for accurate forecasting across different scenarios. Comparative results across three typical scenarios demonstrate that our forecasting approach achieves superior accuracy compared to the graph deep learning model (T -GCN) and tabular foundation model (TabPFN) in the majority of instances, exhibiting stronger robustness. Keywords: Small Earth data, Spatiotemporal correlations, Tabular foundation model, Forecasting, Deep learning 1. Introduction Small Earth data refers to geoscience time-series observations in which short-term monitoring provides limited informative variation, resulting in only sparse but meaningful measurements being available. These data predominantly possess spatiotemporal correlations. Despite their small scale, forecasting on such data is of critical importance for understanding geoscientific processes (Saad et al., 2024; Y u et al., 2024).
Robustness in Both Domains: CLIP Needs a Robust Text Encoder
Rocamora, Elias Abad, Schlarmann, Christian, Singh, Naman Deep, Wu, Yongtao, Hein, Matthias, Cevher, Volkan
Adversarial input attacks can cause a significant shift of CLIP embeddings. This can affect the downstream robustness of models incorporating CLIP in the pipeline, such as text-to-image generative models or large vision language models. While some efforts have been done towards making the CLIP image encoders robust, the robustness of text encoders remains unexplored. In this work, we cover this gap in the literature. We propose LEAF: an efficient adversarial fine-tuning method for the text domain, with the ability to scale to large CLIP models. Our models significantly improve the zero-shot adversarial accuracy in the text domain, while maintaining the vision performance provided by robust image encoders. When combined with text-to-image diffusion models, we can improve the generation quality under adversarial noise. In multimodal retrieval tasks, LEAF improves the recall under adversarial noise over standard CLIP models. Finally, we show that robust text encoders facilitate better reconstruction of input text from its embedding via direct optimization.
- North America > United States (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- (8 more...)