AITopics

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Neural Information Processing SystemsFeb-11-2026, 10:37:03 GMT

d9fc0cdb67638d50f411432d0d41d0ba-Supplemental.pdf

artificial intelligence, machine learning, ssr, (15 more...)

Country: North America > United States > California (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-8-2026, 01:07:08 GMT

Near-OptimalRandomizedExplorationforTabular MarkovDecisionProcesses

These algorithms inject (carefully tuned) random noise to value function to encourage exploration. UCB-type algorithms enjoy well-established theoretical guarantees but suffer from difficult implementation since an upper confidence bound isusually infeasible for manypractical models like neural networks. Instead, practitioners prefer randomized exploration such as noisy networks in [19], and algorithms with randomized exploration have been widely used in practice [37,13,11,35].

artificial intelligence, arxivpreprintarxiv, machine learning, (18 more...)

Country:

North America > United States > California (0.04)
Europe > United Kingdom > England (0.04)
Europe > Romania > Sud-Est Development Region > Constanța County > Constanța (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Zurich, Arie Wortsman, Gerace, Federica, Loureiro, Bruno, Lu, Yue M.

A Random Matrix Theory of Masked Self-Supervised Regression

arXiv.org Machine LearningFeb-2-2026

Self-supervised learning (SSL) -- a training paradigm in which models learn useful representations from unlabeled data by exploiting the data itself as a source of supervision -- has emerged as a foundational component of the recent success of transformer architectures. By avoiding the need for manual annotations, SSL retains many of the benefits traditionally associated with supervised learning while avoiding reliance on labeled data. Consequently, SSL is widely adopted as a pretraining paradigm for learning general-purpose representations that substantially accelerate the optimization of downstream tasks, especially in data-scarce settings. A canonical example of a self-supervised learning task is masked language modeling (MLM), in which a neural network is trained to predict masked tokens in text using the remaining tokens as contextual information (Devlin et al., 2019a; Howard and Ruder, 2018; Radford et al., 2018; Brown et al., 2020; OpenAI, 2024). For example, given the sentence "The capital of France is Paris", a typical MLM task would be to teach the model to infer that we are speaking about the capital of a country from the context "France" and "Paris" from the masked sentence "The [MASK] of France is Paris".

artificial intelligence, machine learning, matrix, (19 more...)

arXiv.org Machine Learning

2601.23208

Country:

Europe > France (0.74)
North America > United States (0.14)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry: Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

arXiv.org Artificial IntelligenceNov-14-2025

SSR: Socratic Self-Refine for Large Language Model Reasoning

Shi, Haizhou, Liu, Ye, Pang, Bo, Liu, Zeyu Leo, Wang, Hao, Savarese, Silvio, Xiong, Caiming, Zhou, Yingbo, Yavuz, Semih

Large Language Models (LLMs) have demonstrated remarkable reasoning abilities, yet existing test-time frameworks often rely on coarse self-verification and self-correction, limiting their effectiveness on complex tasks. In this paper, we propose Socratic Self-Refine (SSR), a novel framework for fine-grained evaluation and precise refinement of LLM reasoning. Our proposed SSR decomposes model responses into verifiable (sub-question, sub-answer) pairs, enabling step-level confidence estimation through controlled re-solving and self-consistency checks. By pinpointing unreliable steps and iteratively refining them, SSR produces more accurate and interpretable reasoning chains. Empirical results across five reasoning benchmarks and three LLMs show that SSR consistently outperforms state-of-the-art iterative self-refinement baselines. Beyond performance gains, SSR provides a principled black-box approach for evaluating and understanding the internal reasoning processes of LLMs. Code is available at https://github.com/SalesforceAIResearch/socratic-self-refine-reasoning.

large language model, machine learning, natural language, (19 more...)

2511.10621

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

arXiv.org Artificial IntelligenceOct-28-2025

LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings

Maier, Benjamin F., Aslak, Ulf, Fiaschi, Luca, Rismal, Nina, Fletcher, Kemble, Luhmann, Christian C., Dow, Robbie, Pappas, Kli, Wiecki, Thomas V.

Consumer research costs companies billions annually yet suffers from panel biases and limited scale. Large language models (LLMs) offer an alternative by simulating synthetic consumers, but produce unrealistic response distributions when asked directly for numerical ratings. We present semantic similarity rating (SSR), a method that elicits textual responses from LLMs and maps these to Likert distributions using embedding similarity to reference statements. Testing on an extensive dataset comprising 57 personal care product surveys conducted by a leading corporation in that market (9,300 human responses), SSR achieves 90% of human test-retest reliability while maintaining realistic response distributions (KS similarity > 0.85). Additionally, these synthetic respondents provide rich qualitative feedback explaining their ratings. This framework enables scalable consumer research simulations while preserving traditional survey metrics and interpretability.

large language model, machine learning, natural language, (16 more...)

2510.08338

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.68)

Industry: Consumer Products & Services > Personal Products > Health & Hygiene Products (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Valencia, Kevin, Liu, Ziyang, Cui, Justin

Data-Efficient Ensemble Weather Forecasting with Diffusion Models

arXiv.org Artificial IntelligenceSep-16-2025

Although numerical weather forecasting methods have dominated the field, recent advances in deep learning methods, such as diffusion models, have shown promise in ensemble weather forecasting. However, such models are typically autoregressive and are thus computationally expensive. This is a challenge in climate science, where data can be limited, costly, or difficult to work with. In this work, we explore the impact of curated data selection on these autoregressive diffusion models. W e evaluate several data sampling strategies and show that a simple time stratified sampling approach achieves performance similar to or better than full-data training. Notably, it outperforms the full-data model on certain metrics and performs only slightly worse on others while using only 20% of the training data. Our results demonstrate the feasibility of data-efficient diffusion training, especially for weather forecasting, and motivates future work on adaptive or model-aware sampling methods that go beyond random or purely temporal sampling.

artificial intelligence, deep learning, machine learning, (16 more...)

2509.11047

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Neural Information Processing SystemsAug-17-2025, 19:15:29 GMT

d9fc0cdb67638d50f411432d0d41d0ba-Supplemental.pdf

artificial intelligence, machine learning, ssr, (14 more...)

Country: North America > United States > California (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJul-14-2025

Improving MLLM's Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency

Liang, Yupu, Zhang, Yaping, Zhang, Zhiyang, Chen, Zhiyuan, Zhao, Yang, Xiang, Lu, Zong, Chengqing, Zhou, Yu

Multimodal Large Language Models (MLLMs) have shown strong performance in document image tasks, especially Optical Character Recognition (OCR). However, they struggle with Document Image Machine Translation (DIMT), which requires handling both cross-modal and cross-lingual challenges. Previous efforts to enhance DIMT capability through Supervised Fine-Tuning (SFT) on the DIMT dataset often result in the forgetting of the model's existing monolingual abilities, such as OCR. To address these challenges, we introduce a novel fine-tuning paradigm, named Synchronously Self-Reviewing (SSR) its OCR proficiency, inspired by the concept "Bilingual Cognitive Advantage". Specifically, SSR prompts the model to generate OCR text before producing translation text, which allows the model to leverage its strong monolingual OCR ability while learning to translate text across languages. Comprehensive experiments demonstrate the proposed SSR learning helps mitigate catastrophic forgetting, improving the generalization ability of MLLMs on both OCR and DIMT tasks.

artificial intelligence, large language model, natural language, (19 more...)

2507.08309

Country:

Europe (1.00)
North America > United States (0.46)
North America > Canada (0.28)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

arXiv.org Artificial IntelligenceJun-23-2025

SSR-Zero: Simple Self-Rewarding Reinforcement Learning for Machine Translation

Yang, Wenjie, Zheng, Mao, Song, Mingyang, Li, Zheng, Wang, Sitong

Large language models (LLMs) have recently demonstrated remarkable capabilities in machine translation (MT). However, most advanced MT-specific LLMs heavily rely on external supervision signals during training, such as human-annotated reference data or trained reward models (RMs), which are often expensive to obtain and challenging to scale. To overcome this limitation, we propose a Simple Self-Rewarding (SSR) Reinforcement Learning (RL) framework for MT that is reference-free, fully online, and relies solely on self-judging rewards. Training with SSR using 13K monolingual examples and Qwen-2.5-7B as the backbone, our model SSR-Zero-7B outperforms existing MT-specific LLMs, e.g., TowerInstruct-13B and GemmaX-28-9B, as well as larger general LLMs like Qwen2.5-32B-Instruct in English $\leftrightarrow$ Chinese translation tasks from WMT23, WMT24, and Flores200 benchmarks. Furthermore, by augmenting SSR with external supervision from COMET, our strongest model, SSR-X-Zero-7B, achieves state-of-the-art performance in English $\leftrightarrow$ Chinese translation, surpassing all existing open-source models under 72B parameters and even outperforming closed-source models, e.g., GPT-4o and Gemini 1.5 Pro. Our analysis highlights the effectiveness of the self-rewarding mechanism compared to the external LLM-as-a-judge approach in MT and demonstrates its complementary benefits when combined with trained RMs. Our findings provide valuable insight into the potential of self-improving RL methods. We have publicly released our code, data and models.

large language model, machine learning, translation, (17 more...)

2505.16637

Country: Asia (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)