AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceDec-4-2025

DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training

Zhu, Dingwei, Xi, Zhiheng, Dou, Shihan, Wang, Yuhui, Li, Sixian, Ye, Junjie, Guo, Honglin, Liu, Shichun, Huang, Chenhao, Yang, Yajie, Shang, Junlin, Jin, Senjie, Zhang, Ming, Zhang, Jiazheng, Huang, Caishuang, Zhang, Yunke, Yan, Demei, Wang, Yuran, Gui, Tao

Reinforcement learning (RL) has shown strong performance in LLM post-training, but real-world deployment often involves noisy or incomplete supervision. In such settings, complex and unreliable supervision signals can destabilize training and harm generalization. While existing approaches such as worst-case optimization (e.g., RFQI, CQL) and mean-based methods (e.g., PPO, GRPO) can improve stability, they often overlook generalization and may produce overly conservative policies, leading to uneven performance across diverse real scenarios. To this end, we introduce DVPO (Distributional Value Modeling with Risk-aware Policy Optimization), a new RL framework that combines conditional risk theory with distributional value modeling to better balance robustness and generalization. DVPO learns token-level value distributions to provide fine-grained supervision, and applies an asymmetric risk regularization to shape the distribution tails: it contracts the lower tail to dampen noisy negative deviations, while expanding the upper tail to preserve exploratory diversity. Across extensive experiments and analysis in multi-turn dialogue, math reasoning, and scientific QA, DVPO consistently outperforms PPO, GRPO, and robust Bellman-based PPO under noisy supervision, showing its potential for LLM post-training in the real-world.

large language model, machine learning, reinforcement learning, (18 more...)

2512.03847

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Neural Information Processing SystemsNov-20-2025, 22:48:26 GMT

Masking: A New Perspective of Noisy Supervision

It is important to learn various types of classifiers given training data with noisy labels. Noisy labels, in the most popular noise model hitherto, are corrupted from ground-truth labels by an unknown noise transition matrix. Thus, by estimating this matrix, classifiers can escape from overfitting those noisy labels. However, such estimation is practically difficult, due to either the indirect nature of two-step approaches, or not big enough data to afford end-to-end approaches. In this paper, we propose a human-assisted approach called ''Masking'' that conveys human cognition of invalid class transitions and naturally speculates the structure of the noise transition matrix. To this end, we derive a structure-aware probabilistic model incorporating a structure prior, and solve the challenges from structure extraction and structure alignment. Thanks to Masking, we only estimate unmasked noise transition probabilities and the burden of estimation is tremendously reduced. We conduct extensive experiments on CIFAR-10 and CIFAR-100 with three noise structures as well as the industrial-level Clothing1M with agnostic noise structure, and the results show that Masking can improve the robustness of classifiers significantly.

masking, name change, new perspective, (6 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.77)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)

arXiv.org Artificial IntelligenceOct-23-2025

Learning Noise-Resilient and Transferable Graph-Text Alignment via Dynamic Quality Assessment

Liu, Yuhang, Shao, Minglai, Wo, Zengyi, Chu, Yunlong, Hao, Bing, Liu, Shengzhong, Wang, Ruijie, Li, Jianxin

Pre-training Graph Foundation Models (GFMs) on text-attributed graphs (TAGs) is central to web-scale applications such as search, recommendation, and knowledge discovery. However, existing CLIP-style graph-text aligners face two key limitations: they assume strict one-to-one correspondences between nodes and texts, overlooking the inherent many-to-many relations in real-world graphs; and they rely on static alignment objectives that cannot adapt to varying data quality, making them brittle under noisy supervision. Together, these limitations expose a core dilemma: embracing expressive many-to-many alignment amplifies noise, while reverting to strict one-to-one strategies sacrifices semantic diversity and fails to handle inherently mismatched pairs. To address these challenges, we propose ADAligner, a dynamic, quality-aware graph-text alignment framework that dynamically adjusts between expressive many-to-many and conservative one-to-one objectives according to supervision quality. ADAligner estimates batch-level alignment reliability in real time and adapts its optimization accordingly, promoting soft, subgraph-level many-to-many alignment when supervision is clean, while emphasizing reliable one-to-one alignment by dynamically filtering low-confidence pairs under noise. Theoretically, we prove that this dynamic mechanism forms a stable negative feedback process, ensuring convergence and robustness. Comprehensive experiments on nine diverse TAG datasets demonstrate that ADAligner consistently outperforms prior graph-text aligners on zero-/few-shot node classification, link prediction and cross-modal retrieval tasks. It maintains strong robustness under noisy supervision and accelerates pre-training by approximately 2 to 3 times compared to multimodal baselines, establishing a scalable and reliable foundation for graph-text representation learning in real-world web environments.

adaligner, machine learning, natural language, (18 more...)

2510.19384

Country: Asia > China (0.47)

Genre: Research Report (0.82)

Industry: Information Technology (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(3 more...)

arXiv.org Artificial IntelligenceOct-13-2025

Exploring Single Domain Generalization of LiDAR-based Semantic Segmentation under Imperfect Labels

Kong, Weitong, Zeng, Zichao, Wen, Di, Wei, Jiale, Peng, Kunyu, Goo, June Moh, Boehm, Jan, Stiefelhagen, Rainer

Accurate perception is critical for vehicle safety, with LiDAR as a key enabler in autonomous driving. To ensure robust performance across environments, sensor types, and weather conditions without costly re-annotation, domain generalization in LiDAR-based 3D semantic segmentation is essential. However, LiDAR annotations are often noisy due to sensor imperfections, occlusions, and human errors. Such noise degrades segmentation accuracy and is further amplified under domain shifts, threatening system reliability. While noisy-label learning is well-studied in images, its extension to 3D LiDAR segmentation under domain generalization remains largely unexplored, as the sparse and irregular structure of point clouds limits direct use of 2D methods. To address this gap, we introduce the novel task Domain Generalization for LiDAR Semantic Segmentation under Noisy Labels (DGLSS-NL) and establish the first benchmark by adapting three representative noisy-label learning strategies from image classification to 3D segmentation. However, we find that existing noisy-label learning approaches adapt poorly to LiDAR data. We therefore propose DuNe, a dual-view framework with strong and weak branches that enforce feature-level consistency and apply cross-entropy loss based on confidence-aware filtering of predictions. Our approach shows state-of-the-art performance by achieving 56.86% mIoU on SemanticKITTI, 42.28% on nuScenes, and 52.58% on SemanticPOSS under 10% symmetric label noise, with an overall Arithmetic Mean (AM) of 49.57% and Harmonic Mean (HM) of 48.50%, thereby demonstrating robust domain generalization in DGLSS-NL tasks. The code is available on our project page.

artificial intelligence, machine learning, segmentation, (15 more...)

2510.09035

Country: Europe (0.28)

Genre: Research Report (0.82)

Industry:

Information Technology (0.49)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsOct-10-2024, 17:28:27 GMT

Learning to Generate Visual Questions with Noisy Supervision

generate visual question, noisy supervision, visual hint, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.61)

arXiv.org Artificial IntelligenceFeb-15-2023

Search-Engine-augmented Dialogue Response Generation with Cheaply Supervised Query Production

Wang, Ante, Song, Linfeng, Liu, Qi, Mi, Haitao, Wang, Longyue, Tu, Zhaopeng, Su, Jinsong, Yu, Dong

Knowledge-aided dialogue response generation aims at augmenting chatbots with relevant external knowledge in the hope of generating more informative responses. The majority of previous work assumes that the relevant knowledge is given as input or retrieved from a static pool of knowledge. However, this assumption violates the real-world situation, where knowledge is continually updated and a chatbot has to dynamically retrieve useful knowledge. We propose a dialogue model that can access the vast and dynamic information from any search engine for response generation. As the core module, a query producer is used to generate queries from a dialogue context to interact with a search engine. We design a training algorithm using cheap noisy supervision for the query producer, where the signals are obtained by comparing retrieved articles with the next dialogue response. As the result, the query producer is adjusted without any human annotation of gold queries, making it easily transferable to other domains and search engines. Experiments show that our query producer can achieve R@1 and R@5 rates of 62.4% and 74.8% for retrieving gold knowledge, and the overall model generates better responses over strong knowledge-aided baselines using BART and other typical systems.

artificial intelligence, information retrieval, natural language, (15 more...)

doi: 10.1016/j.artint.2023.103874

2302.093

Country:

North America > United States > Mississippi > Lee County > Tupelo (0.04)
North America > United States > Tennessee > Davidson County > Nashville (0.04)
North America > Canada > Ontario > Toronto (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (1.00)
Media (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

#artificialintelligenceApr-28-2020, 21:38:09 GMT

New image recognition method proposed based on large-scale dataset

Researchers from the Shenzhen Institutes of Advanced Technology (SIAT) of the Chinese Academy of Sciences have proposed a product image recognition method with guidance learning and noisy supervision. The study was published in Computer Vision and Image Understanding. Instead of collecting product images by laborious and time-intensive image capturing, the team introduced a novel large-scale dataset called Product-90. Consisting of more than 140K images with 90 categories, the dataset was related to Clothing1M (a large-scale public dataset designed for learning from noisy data with human supervision), but contained many more categories. Images were collected from reviews on e-commerce websites.

dataset, image recognition method, large-scale dataset, (6 more...)

#artificialintelligence

Country: Asia > China > Guangdong Province > Shenzhen (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Services > e-Commerce Services (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.64)

Neural Information Processing SystemsFeb-14-2020, 17:28:04 GMT

Masking: A New Perspective of Noisy Supervision

Han, Bo, Yao, Jiangchao, Niu, Gang, Zhou, Mingyuan, Tsang, Ivor, Zhang, Ya, Sugiyama, Masashi

artificial intelligence, machine learning, masking, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.87)

Girard, Nicolas, Charpiat, Guillaume, Tarabalka, Yuliya

Noisy Supervision for Correcting Misaligned Cadaster Maps Without Perfect Ground Truth Data

arXiv.org Machine LearningMar-12-2019

ABSTRACT In machine learning the best performance on a certain task is achieved by fully supervised methods when perfect ground truth labels are available. However, labels are often noisy, especially in remote sensing where manually curated public datasets are rare. We study the multi-modal cadaster map alignment problem for which available annotations are misaligned polygons, resulting in noisy supervision. We subsequently set up a multiple-rounds training scheme which corrects the ground truth annotations at each round to better train the model at the next round. We show that it is possible to Figure 1: Qualitative alignment results on a crop of an image of reduce the noise of the dataset by iteratively training a better Bloomington from the Inria dataset.

annotation, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1903.06529

Country:

North America > United States (0.04)
Europe > France > Provence-Alpes-Côte d'Azur (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)