AITopics | viper

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Neural Information Processing SystemsNov-21-2025, 14:11:14 GMT

Verifiable Reinforcement Learning via Policy Extraction

decision tree policy, machine learning, reinforcement learning, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Neural Information Processing SystemsNov-19-2025, 21:44:17 GMT

Video Prediction Models as Rewards for Reinforcement Learning

Specifying reward signals that allow agents to learn complex behaviors is a longstanding challenge in reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.46)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceOct-29-2025

ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model

Zhang, Juntian, Jin, Song, Cheng, Chuanqi, Liu, Yuhan, Lin, Yankai, Zhang, Xun, Zhang, Yufei, Jiang, Fei, Yin, Guojun, Lin, Wei, Yan, Rui

The limited capacity for fine-grained visual perception presents a critical bottleneck for Vision-Language Models (VLMs) in real-world applications. Addressing this is challenging due to the scarcity of high-quality data and the limitations of existing methods: supervised fine-tuning (SFT) often compromises general capabilities, while reinforcement fine-tuning (RFT) prioritizes textual reasoning over visual perception. To bridge this gap, we propose a novel two-stage task that structures visual perception learning as a coarse-to-fine progressive process. Based on this task formulation, we develop ViPER, a self-bootstrapping framework specifically designed to enable iterative evolution through self-critiquing and self-prediction. By synergistically integrating image-level and instance-level reconstruction with a two-stage reinforcement learning strategy, ViPER establishes a closed-loop training paradigm, where internally synthesized data directly fuel the enhancement of perceptual ability. Applied to the Qwen2.5-VL family, ViPER produces the Qwen-Viper series. With an average gain of 1.7% on seven comprehensive benchmarks spanning various tasks and up to 6.0% on fine-grained perception, Qwen-Viper consistently demonstrates superior performance across different vision-language scenarios while maintaining generalizability. Beyond enabling self-improvement in perceptual capabilities, ViPER provides concrete evidence for the reciprocal relationship between generation and understanding, a breakthrough to developing more autonomous and capable VLMs.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

2510.24285

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

How snake bites really work

Vipers can strike within 100 milliseconds of launching at their prey. Breakthroughs, discoveries, and DIY tips sent every weekday. A venomous snake bite is not something you ever want to encounter on a hiking or camping trip. For those brave scientists who study snakes-aka herpetologists -the mechanics behind the reptiles' fast fangs are more fascinating than fear-inducing. Snakes must move incredibly quickly to sink their fangs into prey before the victim flinches.

artificial intelligence, fang, snake, (13 more...)

Popular Science

Country: North America > United States (0.15)

Genre: Research Report > New Finding (0.50)

Industry: Health & Medicine > Therapeutic Area > Environmental Medicine > Snake Bites (1.00)

Technology: Information Technology > Artificial Intelligence (0.50)

Neural Information Processing SystemsOct-10-2025, 23:47:04 GMT

Video Prediction Models as Rewards for Reinforcement Learning Alejandro Escontrela Ademi Adeniji Wilson Y an

Specifying reward signals that allow agents to learn complex behaviors is a longstanding challenge in reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.46)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceAug-26-2025

Instant Preference Alignment for Text-to-Image Diffusion Models

Li, Yang, Yang, Songlin, Han, Xiaoxuan, Wang, Wei, Dong, Jing, Lyu, Yueming, Xue, Ziyu

Text-to-image (T2I) generation has greatly enhanced creative expression, yet achieving preference-aligned generation in a real-time and training-free manner remains challenging. Previous methods often rely on static, pre-collected preferences or fine-tuning, limiting adaptability to evolving and nuanced user intents. In this paper, we highlight the need for instant preference-aligned T2I generation and propose a training-free framework grounded in multimodal large language model (MLLM) priors. Our framework decouples the task into two components: preference understanding and preference-guided generation. For preference understanding, we leverage MLLMs to automatically extract global preference signals from a reference image and enrich a given prompt using structured instruction design. Our approach supports broader and more fine-grained coverage of user preferences than existing methods. For preference-guided generation, we integrate global keyword-based control and local region-aware cross-attention modulation to steer the diffusion model without additional training, enabling precise alignment across both global attributes and local elements. The entire framework supports multi-round interactive refinement, facilitating real-time and context-aware image generation. Extensive experiments on the Viper dataset and our collected benchmark demonstrate that our method outperforms prior approaches in both quantitative metrics and human evaluations, and opens up new possibilities for dialog-based generation and MLLM-diffusion integration.

artificial intelligence, machine learning, natural language, (18 more...)

2508.17718

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Hong Kong (0.04)
Africa > Middle East > Egypt (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Aissi, Mohamed Salim, Grislain, Clemence, Chetouani, Mohamed, Sigaud, Olivier, Soulier, Laure, Thome, Nicolas

VIPER: Visual Perception and Explainable Reasoning for Sequential Decision-Making

arXiv.org Artificial IntelligenceMar-19-2025

While Large Language Models (LLMs) excel at reasoning on text and Vision-Language Models (VLMs) are highly effective for visual perception, applying those models for visual instruction-based planning remains a widely open problem. In this paper, we introduce VIPER, a novel framework for multimodal instruction-based planning that integrates VLM-based perception with LLM-based reasoning. Our approach uses a modular pipeline where a frozen VLM generates textual descriptions of image observations, which are then processed by an LLM policy to predict actions based on the task goal. We fine-tune the reasoning module using behavioral cloning and reinforcement learning, improving our agent's decision-making capabilities. Experiments on the ALFWorld benchmark show that VIPER significantly outperforms state-of-the-art visual instruction-based planners while narrowing the gap with purely text-based oracles. By leveraging text as an intermediate representation, VIPER also enhances explainability, paving the way for a fine-grained analysis of perception and reasoning components.

large language model, machine learning, natural language, (17 more...)

2503.15108

Country: Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsOct-8-2024, 08:11:37 GMT

Reviews: Verifiable Reinforcement Learning via Policy Extraction

Post rebuttal Thank the authors for the clarification. One minor point I realised is the equation between line 144 and 145. Is this constraint really a disjunction over partitions? If there is at least one partition the given state doesn't belong to, it would be always true because at least one of inner propositions will be true, wouldn't it? The trained decision tree policy allows for its verification in terms of, more specifically, correctness, stability and robustness.

artificial intelligence, machine learning, verifiable reinforcement learning, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Vos, Daniël, Verwer, Sicco

Optimizing Interpretable Decision Tree Policies for Reinforcement Learning

arXiv.org Artificial IntelligenceAug-21-2024

Reinforcement learning techniques leveraging deep learning have made tremendous progress in recent years. However, the complexity of neural networks prevents practitioners from understanding their behavior. Decision trees have gained increased attention in supervised learning for their inherent interpretability, enabling modelers to understand the exact prediction process after learning. This paper considers the problem of optimizing interpretable decision tree policies to replace neural networks in reinforcement learning settings. Previous works have relaxed the tree structure, restricted to optimizing only tree leaves, or applied imitation learning techniques to approximately copy the behavior of a neural network policy with a decision tree. We propose the Decision Tree Policy Optimization (DTPO) algorithm that directly optimizes the complete decision tree using policy gradients. Our technique uses established decision tree heuristics for regression to perform policy optimization. We empirically show that DTPO is a competitive algorithm compared to imitation learning algorithms for optimizing decision tree policies in reinforcement learning.

artificial intelligence, decision tree, machine learning, (17 more...)

2408.11632

Country:

Asia > Middle East > Jordan (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)