ssr
Near-OptimalRandomizedExplorationforTabular MarkovDecisionProcesses
These algorithms inject (carefully tuned) random noise to value function to encourage exploration. UCB-type algorithms enjoy well-established theoretical guarantees but suffer from difficult implementation since an upper confidence bound isusually infeasible for manypractical models like neural networks. Instead, practitioners prefer randomized exploration such as noisy networks in [19], and algorithms with randomized exploration have been widely used in practice [37,13,11,35].
- North America > United States > California (0.04)
- Europe > United Kingdom > England (0.04)
- Europe > Romania > Sud-Est Development Region > Constanța County > Constanța (0.04)
A Random Matrix Theory of Masked Self-Supervised Regression
Zurich, Arie Wortsman, Gerace, Federica, Loureiro, Bruno, Lu, Yue M.
Self-supervised learning (SSL) -- a training paradigm in which models learn useful representations from unlabeled data by exploiting the data itself as a source of supervision -- has emerged as a foundational component of the recent success of transformer architectures. By avoiding the need for manual annotations, SSL retains many of the benefits traditionally associated with supervised learning while avoiding reliance on labeled data. Consequently, SSL is widely adopted as a pretraining paradigm for learning general-purpose representations that substantially accelerate the optimization of downstream tasks, especially in data-scarce settings. A canonical example of a self-supervised learning task is masked language modeling (MLM), in which a neural network is trained to predict masked tokens in text using the remaining tokens as contextual information (Devlin et al., 2019a; Howard and Ruder, 2018; Radford et al., 2018; Brown et al., 2020; OpenAI, 2024). For example, given the sentence "The capital of France is Paris", a typical MLM task would be to teach the model to infer that we are speaking about the capital of a country from the context "France" and "Paris" from the masked sentence "The [MASK] of France is Paris".
- Europe > France (0.74)
- North America > United States (0.14)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- (5 more...)
Supplementary material Re-ranking for image retrieval and transductive few-shot classification Xi Shen 1, Y ang Xiao
PR, and results with different number of neighbors N are provided in Section 3. Hard training data sampling The main difficulty is that there is no standard clean training set. In Table 1, we provide the analysis of hyper-parameters in our SSR on rOxford5K and rParis6K. Additionally, we also study the impact of number of neighbors N. We observe that on rOxford5K Note that bold numbers are reported in the paper. To combine QE and SSR, we directly apply SSR to the retrieved samples given by QE. As we can see, in most cases, our SSR can again improve the performance of QEs.
SSR: Socratic Self-Refine for Large Language Model Reasoning
Shi, Haizhou, Liu, Ye, Pang, Bo, Liu, Zeyu Leo, Wang, Hao, Savarese, Silvio, Xiong, Caiming, Zhou, Yingbo, Yavuz, Semih
Large Language Models (LLMs) have demonstrated remarkable reasoning abilities, yet existing test-time frameworks often rely on coarse self-verification and self-correction, limiting their effectiveness on complex tasks. In this paper, we propose Socratic Self-Refine (SSR), a novel framework for fine-grained evaluation and precise refinement of LLM reasoning. Our proposed SSR decomposes model responses into verifiable (sub-question, sub-answer) pairs, enabling step-level confidence estimation through controlled re-solving and self-consistency checks. By pinpointing unreliable steps and iteratively refining them, SSR produces more accurate and interpretable reasoning chains. Empirical results across five reasoning benchmarks and three LLMs show that SSR consistently outperforms state-of-the-art iterative self-refinement baselines. Beyond performance gains, SSR provides a principled black-box approach for evaluating and understanding the internal reasoning processes of LLMs. Code is available at https://github.com/SalesforceAIResearch/socratic-self-refine-reasoning.
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > Czechia > Prague (0.04)
LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings
Maier, Benjamin F., Aslak, Ulf, Fiaschi, Luca, Rismal, Nina, Fletcher, Kemble, Luhmann, Christian C., Dow, Robbie, Pappas, Kli, Wiecki, Thomas V.
Consumer research costs companies billions annually yet suffers from panel biases and limited scale. Large language models (LLMs) offer an alternative by simulating synthetic consumers, but produce unrealistic response distributions when asked directly for numerical ratings. We present semantic similarity rating (SSR), a method that elicits textual responses from LLMs and maps these to Likert distributions using embedding similarity to reference statements. Testing on an extensive dataset comprising 57 personal care product surveys conducted by a leading corporation in that market (9,300 human responses), SSR achieves 90% of human test-retest reliability while maintaining realistic response distributions (KS similarity > 0.85). Additionally, these synthetic respondents provide rich qualitative feedback explaining their ratings. This framework enables scalable consumer research simulations while preserving traditional survey metrics and interpretability.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Ireland (0.04)
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (0.68)
StressTest: Can YOUR Speech LM Handle the Stress?
Yosha, Iddo, Maimon, Gallil, Adi, Yossi
Sentence stress refers to emphasis on words within a spoken utterance to highlight or contrast an idea. It is often used to imply an underlying intention not explicitly stated. Recent speech-aware language models (SLMs) have enabled direct audio processing, allowing models to access the full richness of speech to perform audio reasoning tasks such as spoken question answering. Despite the crucial role of sentence stress in shaping meaning and intent, it remains largely overlooked in evaluation and development of SLMs. We address this gap by introducing StressTest, a benchmark designed to evaluate models' ability to distinguish between meanings of speech based on the stress pattern. We evaluate leading SLMs, and find that despite their overall capabilities, they perform poorly on such tasks. Hence, we propose a novel data generation pipeline, and create Stress-17k, a training set that simulates change of meaning implied by stress variation. Results suggest, that our finetuned model, StresSLM, generalizes well to real recordings and notably outperforms existing SLMs on sentence stress reasoning and detection. Models, code, data, samples - pages.cs.huji.ac.il/adiyoss-lab/stresstest.
- North America > United States > New York (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Leisure & Entertainment (0.93)
- Education (0.93)
- Media (0.68)
- (5 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)
Data-Efficient Ensemble Weather Forecasting with Diffusion Models
Valencia, Kevin, Liu, Ziyang, Cui, Justin
Although numerical weather forecasting methods have dominated the field, recent advances in deep learning methods, such as diffusion models, have shown promise in ensemble weather forecasting. However, such models are typically autoregressive and are thus computationally expensive. This is a challenge in climate science, where data can be limited, costly, or difficult to work with. In this work, we explore the impact of curated data selection on these autoregressive diffusion models. W e evaluate several data sampling strategies and show that a simple time stratified sampling approach achieves performance similar to or better than full-data training. Notably, it outperforms the full-data model on certain metrics and performs only slightly worse on others while using only 20% of the training data. Our results demonstrate the feasibility of data-efficient diffusion training, especially for weather forecasting, and motivates future work on adaptive or model-aware sampling methods that go beyond random or purely temporal sampling.
Improving MLLM's Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency
Liang, Yupu, Zhang, Yaping, Zhang, Zhiyang, Chen, Zhiyuan, Zhao, Yang, Xiang, Lu, Zong, Chengqing, Zhou, Yu
Multimodal Large Language Models (MLLMs) have shown strong performance in document image tasks, especially Optical Character Recognition (OCR). However, they struggle with Document Image Machine Translation (DIMT), which requires handling both cross-modal and cross-lingual challenges. Previous efforts to enhance DIMT capability through Supervised Fine-Tuning (SFT) on the DIMT dataset often result in the forgetting of the model's existing monolingual abilities, such as OCR. To address these challenges, we introduce a novel fine-tuning paradigm, named Synchronously Self-Reviewing (SSR) its OCR proficiency, inspired by the concept "Bilingual Cognitive Advantage". Specifically, SSR prompts the model to generate OCR text before producing translation text, which allows the model to leverage its strong monolingual OCR ability while learning to translate text across languages. Comprehensive experiments demonstrate the proposed SSR learning helps mitigate catastrophic forgetting, improving the generalization ability of MLLMs on both OCR and DIMT tasks.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > Austria > Vienna (0.14)
- Europe > Poland > Podlaskie Province > Bialystok (0.05)
- (13 more...)
SSR-Zero: Simple Self-Rewarding Reinforcement Learning for Machine Translation
Yang, Wenjie, Zheng, Mao, Song, Mingyang, Li, Zheng, Wang, Sitong
Large language models (LLMs) have recently demonstrated remarkable capabilities in machine translation (MT). However, most advanced MT-specific LLMs heavily rely on external supervision signals during training, such as human-annotated reference data or trained reward models (RMs), which are often expensive to obtain and challenging to scale. To overcome this limitation, we propose a Simple Self-Rewarding (SSR) Reinforcement Learning (RL) framework for MT that is reference-free, fully online, and relies solely on self-judging rewards. Training with SSR using 13K monolingual examples and Qwen-2.5-7B as the backbone, our model SSR-Zero-7B outperforms existing MT-specific LLMs, e.g., TowerInstruct-13B and GemmaX-28-9B, as well as larger general LLMs like Qwen2.5-32B-Instruct in English $\leftrightarrow$ Chinese translation tasks from WMT23, WMT24, and Flores200 benchmarks. Furthermore, by augmenting SSR with external supervision from COMET, our strongest model, SSR-X-Zero-7B, achieves state-of-the-art performance in English $\leftrightarrow$ Chinese translation, surpassing all existing open-source models under 72B parameters and even outperforming closed-source models, e.g., GPT-4o and Gemini 1.5 Pro. Our analysis highlights the effectiveness of the self-rewarding mechanism compared to the external LLM-as-a-judge approach in MT and demonstrates its complementary benefits when combined with trained RMs. Our findings provide valuable insight into the potential of self-improving RL methods. We have publicly released our code, data and models.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > Finland > Pirkanmaa > Tampere (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)