AITopics

Real-world robotic manipulation in homes and factories demands reliability, efficiency, and robustness that approach or surpass the performance of skilled human operators. We present RL-100, a real-world reinforcement learning framework built on diffusion-based visuomotor policies. RL-100 unifies imitation and reinforcement learning under a single PPO-style objective applied within the denoising process, yielding conservative and stable policy improvements across both offline and online stages. To meet deployment latency constraints, we employ a lightweight consistency distillation procedure that compresses multi-step diffusion into a one-step controller for high-frequency control. The framework is task-, embodiment-, and representation-agnostic, and supports both single-action outputs and action-chunking control. We evaluate RL-100 on seven diverse real-robot manipulation tasks, ranging from dynamic pushing and agile bowling to pouring, cloth folding, unscrewing, and multi-stage juicing. RL-100 attains 100% success across evaluated trials, achieving 900 out of 900 successful episodes, including up to 250 out of 250 consecutive trials on one task, and matches or surpasses expert teleoperators in time-to-completion. Without retraining, a single policy attains approximately 90% zero-shot success under environmental and dynamics shifts, adapts in a few-shot regime to significant task variations (86.7%), and remains robust to aggressive human perturbations (about 95%). In a public shopping-mall deployment, the juicing robot served random customers continuously for roughly seven hours without failure. Together, these results suggest a practical path toward deployment-ready robot learning: start from human priors, align training objectives with human-grounded metrics, and reliably extend performance beyond human demonstrations.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2510.1483

Country:

North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report > New Finding (0.87)

Industry: Education > Educational Setting (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.66)

Tan, Daniel Jason, Chen, Jiayang, Perera, Dilruk, See, Kay Choong, Feng, Mengling

DeepEN: A Deep Reinforcement Learning Framework for Personalized Enteral Nutrition in Critical Care

Objective: Current ICU enteral feeding remains sub-optimal due to limited personalization and ongoing uncertainty about appropriate calorie, protein, and fluid targets--particularly in the context of rapidly changing metabolic demands and heterogeneous responses to therapeutic interventions. This study introduces DeepEN, a novel reinforcement learning (RL)-based framework designed to dynamically personalize enteral nutrition (EN) dosing for critically ill patients using electronic health record data. Methods: DeepEN was trained on data from over 11,000 ICU patients in the MIMIC-IV database to generate 4-hourly, patient-specific targets for caloric, protein, and fluid intake. The model's state space integrates demographics, comorbidities, vital signs, laboratory measurements, and recent interventions considered relevant to nutritional management. The reward function was designed with domain expertise to balance short-term physiological and nutrition-related goals with long-term survival outcomes, reflecting real-world clinical priorities. The framework employs a dueling double deep Q-network with Conservative Q-Learning regularization to ensure safe and reliable policy learning from retrospective data. Model performance was benchmarked against both clinician-derived and guideline-based policies. Results: DeepEN outperformed both clinician and guideline-based policies, achieving a 3.7 0.17 percentage-point absolute reduction in estimated morarXiv:2510.08350v2 [cs.LG] 19 Nov 2025 tality compared with the clinician policy (18.8% vs 22.5%) and higher expected returns relative to the gold-standard guideline policy (11.89 vs 8.11). Control of key nutritional biomarkers was also improved under the learned policy.

enteral nutrition, machine learning, reinforcement learning, (13 more...)

2510.0835

Country: Asia > Singapore (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (0.94)
Health & Medicine > Pharmaceuticals & Biotechnology (0.69)
Health & Medicine > Health Care Technology > Medical Record (0.68)
(5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks

Lian, Shijie, Wu, Changti, Yang, Laurence Tianruo, Yuan, Hang, Yu, Bin, Zhang, Lei, Chen, Kai

Spatial intelligence spans a rich suite of abilities, including visualising and transforming shapes, mentally rotating objects, judging relational positions and containment, and estimating numerosity. However, it still remains a critical unresolved challenge for Multimodal Large Language Models (MLLMs). To fill this gap, we propose to treat Euclidean geometry problem-solving as a surrogate task. Specifically, we meticulously constructed a curated multimodal dataset, called Euclid30K, comprising approximately 30K plane and solid geometry problems. Furthermore, to enable the model to learn and apply Euclidean principles from these geometry problems, we fine-tuned seven model variants (spanning 3--72B parameters) from the Qwen2.5VL, Qwen3VL, and RoboBrain2.0 families using Group Relative Policy Optimization (GRPO), inspiring the models to identify shapes, count, and relate entities, and perform multi-step deductive reasoning using Euclidean principles. Our experiments demonstrate that the resulting models achieve substantial zero-shot gains across four spatial reasoning benchmarks (Super-CLEVR, Omni3DBench, VSI-Bench, and MindCube) without any task-specific adaptations. Notably, after training on the Euclid30K, the mean VSI-Bench accuracy rose from 36.6\% to 41.8\% (+5.2\%), and the mean MindCube accuracy rose from 31.4\% to 38.1\% (+6.7\%). To our knowledge, this is the first systematic study showing that geometry-centric fine-tuning can confer vision-language models with broadly transferable spatial skills. Code and Euclid30K dataset can be found in \href{https://zgca-ai4edu.github.io/Euclids_Gift}{this}.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

2509.24473

Country: Asia (0.46)

Genre: Research Report (1.00)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Towards Alignment-Centric Paradigm: A Survey of Instruction Tuning in Large Language Models

Han, Xudong, Yang, Junjie, Wang, Tianyang, Bi, Ziqian, Song, Xinyuan, Hao, Junfeng, Song, Junhao

Instruction tuning is a pivotal technique for aligning large language models (LLMs) with human intentions, safety constraints, and domain-specific requirements. This survey provides a comprehensive overview of the full pipeline, encompassing (i) data collection methodologies, (ii) full-parameter and parameter-efficient fine-tuning strategies, and (iii) evaluation protocols. We categorized data construction into three major paradigms: expert annotation, distillation from larger models, and self-improvement mechanisms, each offering distinct trade-offs between quality, scalability, and resource cost. Fine-tuning techniques range from conventional supervised training to lightweight approaches, such as low-rank adaptation (LoRA) and prefix tuning, with a focus on computational efficiency and model reusability. We further examine the challenges of evaluating faithfulness, utility, and safety across multilingual and multimodal scenarios, highlighting the emergence of domain-specific benchmarks in healthcare, legal, and financial applications. Finally, we discuss promising directions for automated data generation, adaptive optimization, and robust evaluation frameworks, arguing that a closer integration of data, algorithms, and human feedback is essential for advancing instruction-tuned LLMs. This survey aims to serve as a practical reference for researchers and practitioners seeking to design LLMs that are both effective and reliably aligned with human intentions.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2508.17184

Country: Europe > United Kingdom (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.92)

Industry:

Information Technology (1.00)
Education (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Jordán, Joaquín, Yin, Xavier, Fabros, Melissa, Ranade, Gireeja, Norouzi, Narges

MAGIC: Multi-Agent Argumentation and Grammar Integrated Critiquer

Automated Essay Scoring (AES) and Automatic Essay Feedback (AEF) systems aim to reduce the workload of human raters in educational assessment. However, most existing systems prioritize numerical scoring accuracy over feedback quality and are primarily evaluated on pre-secondary school level writing. This paper presents Multi-Agent Argumentation and Grammar Integrated Critiquer (MAGIC), a framework using five specialized agents to evaluate prompt adherence, persuasiveness, organization, vocabulary, and grammar for both holistic scoring and detailed feedback generation. To support evaluation at the college level, we collated a dataset of Graduate Record Examination (GRE) practice essays with expert-evaluated scores and feedback. MAGIC achieves substantial to near-perfect scoring agreement with humans on the GRE data, outperforming baseline LLM models while providing enhanced interpretability through its multi-agent approach. We also compare MAGIC's feedback generation capabilities against ground truth human feedback and baseline models, finding that MAGIC achieves strong feedback quality and naturalness.

large language model, machine learning, natural language, (19 more...)

2506.13037

Country:

North America > Mexico (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Education > Educational Setting (1.00)
Education > Assessment & Standards (1.00)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Lu, Tongyu, Geist, Charlotta-Marlena, Melechovsky, Jan, Roy, Abhinaba, Herremans, Dorien

MelodySim: Measuring Melody-aware Music Similarity for Plagiarism Detection

We propose MelodySim, a melody-aware music similarity model and dataset for plagiarism detection. First, we introduce a novel method to construct a dataset focused on melodic similarity. By augmenting Slakh2100, an existing MIDI dataset, we generate variations of each piece while preserving the melody through modifications such as note splitting, arpeggiation, minor track dropout, and re-instrumentation. A user study confirms that positive pairs indeed contain similar melodies, while other musical tracks are significantly changed. Second, we develop a segment-wise melodic-similarity detection model that uses a MERT encoder and applies a triplet neural network to capture melodic similarity. The resulting decision matrix highlights where plagiarism might occur. The experiments show that our model is able to outperform baseline models in detecting similar melodic fragments on the MelodySim test set.

artificial intelligence, machine learning, natural language, (15 more...)

2505.20979

Country: Europe (0.46)

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.63)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

GPA-RAM: Grasp-Pretraining Augmented Robotic Attention Mamba for Spatial Task Learning

Sheng, Juyi, Liu, Yangjun, Xu, Sheng, Yang, Zhixin, Liu, Mengyuan

Abstract--T ask failures in prior fine-grained robotic manipulation methods often stem from suboptimal initial grasping, which is critical for subsequent manipulation and reducing the requirement for complex pose adjustments. T o address this, we propose Grasp-Pretraining Augmentation (GPA)--a general multi-modal learning framework that enhances grasp perception without additional grasp pose data collection and labeling. GPA achieves evident enhancement on RLBench multi-task benchmark (from 79.3% to 84.2%) and ALOHA bimanual manipulation tasks (from 86%/16% to 98%/38%). Although GPA enhances fine-grained grasping performance by leveraging increased model capacity, it incurs computational latency and hinders real-time deployment. T o mitigate this limitation, we propose Robotic Attention Mamba (RAM). Our unified GPA-RAM framework balances model capacity with efficiency and applies to both discrete and continuous action generation. GPA-RAM demonstrates superior performance across four robotic systems with diverse camera configurations in both simulation and the real world. Compared with previous state-of-the-art methods, it improves average success rates by 8.2% over RVT2 (from 79.3% to 87.5%) and 2.6% over ARP This work provides a framework for developing robotic systems that are simultaneously precise and responsive. The project and code are at https://gpa-ram.github.io/ These authors contributed equally to this work. This work was supported by National Natural Science Foundation of China (No. 62473007), and Shenzhen Innovation in Science and Technology Foundation for The Excellent Y outh Scholars (No. RCYX20231211090248064). (Corresponding author: Mengyuan Liu.) Juyi Sheng, Peiming Li and Mengyuan Liu are with the State Key Laboratory of General Artificial Intelligence, Peking University, Shenzhen Graduate School, Shenzhen, 518055, China (email: logss2024@stu.pku.edu.cn;

artificial intelligence, machine learning, manipulation, (12 more...)

2504.19683

Country: Asia > China > Guangdong Province > Shenzhen (0.65)

Genre: Research Report > Promising Solution (0.34)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.46)