cont
- Media (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- (3 more...)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Leisure & Entertainment > Games (0.46)
- Information Technology (0.46)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom (0.04)
TOIST: TaskOrientedInstanceSegmentation TransformerwithNoun-PronounDistillation SupplementaryMaterial
As mentioned in Section 3(formulation) of the main paper, in an input image, it is possible that no objects or multiple objects afford a specific task. As areminder,we use the whole verb-pronoun (or verb-noun) description as token span. With probability 0.5, an image is cropped to a random size, where each side is between384and1333pixels. Both of the student and teacher TOIST models are initialized with the model pre-trained by [4]. In an image, the most suitable objects (one or more) for solving the task are selected and their bounding boxes are taken as ground truth labels for detection.
The Good, The Bad, and The Hybrid: A Reward Structure Showdown in Reasoning Models Training
Reward design is central to reinforcement learning from human feedback (RLHF) and alignment research. In this work, we propose a unified framework to study hard, continuous, and hybrid reward structures for fine-tuning large language models (LLMs) on mathematical reasoning tasks. Using Qwen3-4B with LoRA fine-tuning on the GSM8K dataset, we formalize and empirically evaluate reward formulations that incorporate correctness, perplexity, reasoning quality, and consistency. We introduce an adaptive hybrid reward scheduler that transitions between discrete and continuous signals, balancing exploration and stability. Our results show that hybrid reward structures improve convergence speed and training stability over purely hard or continuous approaches, offering insights for alignment via adaptive reward modeling.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
DRL-Based Resource Allocation for Energy-Efficient IRS-Assisted UAV Spectrum Sharing Systems
Intelligent reflecting surface (IRS) assisted unmanned aerial vehicle (UAV) systems provide a new paradigm for reconfigurable and flexible wireless communications. To enable more energy efficient and spectrum efficient IRS assisted UAV wireless communications, this paper introduces a novel IRS-assisted UAV enabled spectrum sharing system with orthogonal frequency division multiplexing (OFDM). The goal is to maximize the energy efficiency (EE) of the secondary network by jointly optimizing the beamforming, subcarrier allocation, IRS phase shifts, and the UAV trajectory subject to practical transmit power and passive reflection constraints as well as UAV physical limitations. A physically grounded propulsion-energy model is adopted, with its tight upper bound used to form a tractable EE lower bound for the spectrum sharing system. To handle highly non convex, time coupled optimization problems with a mixed continuous and discrete policy space, we develop a deep reinforcement learning (DRL) approach based on the actor critic framework. Extended experiments show the significant EE improvement of the proposed DRL-based approach compared to several benchmark schemes, thus demonstrating the effectiveness and robustness of the proposed approach with mobility.
- North America > United States (1.00)
- Asia > China > Beijing > Beijing (0.04)
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.04)
- Government > Tax (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Information Technology > Communications > Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.88)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Stop-RAG: Value-Based Retrieval Control for Iterative RAG
Park, Jaewan, Cho, Solbee, Lee, Jay-Yoon
Iterative retrieval-augmented generation (RAG) enables large language models to answer complex multi-hop questions, but each additional loop increases latency, costs, and the risk of introducing distracting evidence, motivating the need for an efficient stopping strategy. Existing methods either use a predetermined number of iterations or rely on confidence proxies that poorly reflect whether more retrieval will actually help. We cast iterative RAG as a finite-horizon Markov decision process and introduce Stop-RAG, a value-based controller that adaptively decides when to stop retrieving. Trained with full-width forward-view Q($λ$) targets from complete trajectories, Stop-RAG learns effective stopping policies while remaining compatible with black-box APIs and existing pipelines. On multi-hop question-answering benchmarks, Stop-RAG consistently outperforms both fixed-iteration baselines and prompting-based stopping with LLMs. These results highlight adaptive stopping as a key missing component in current agentic systems, and demonstrate that value-based control can improve the accuracy of RAG systems.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- (10 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
- North America > United States (1.00)
- Oceania > Australia (0.14)
- Asia > Thailand (0.14)
- (8 more...)
- Media (1.00)
- Law (1.00)
- Health & Medicine (1.00)
- (3 more...)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Leisure & Entertainment > Games (0.46)
- Information Technology (0.46)