AITopics | mug

2511.11182

Country:

Asia > China > Hong Kong (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Slovenia > Central Slovenia > Municipality of Komenda > Komenda (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsNov-14-2025, 12:12:32 GMT

A Appendix A.1 Additional Method Justification

This problem has been studied in stochastic optimal control, particularly REPS [Peters et al., 2010]. In our experiments, we use soft actor-critic [Haarnoja et al., 2018] as our base RL algorithm. The policy and critic networks are MLPs with 2 fully-connected hidden layers of size 256. Following [Sharma et al., 2021b], we use a biased TD update, where For all experiments using prior data collected through RL, the agent was initialized at test time with the pretrained policy and critic. The details for this environment are in [Sharma et al., 2021b].

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

WIREDNov-12-2025, 20:43:02 GMT

Our Favorite Travel and Outdoor Gear Is on Sale at Huckberry

Huckberry's eclectic curation of travel clothing, coffee gear, and backpacks are all on sale right now. Huckberry, purveyor of finely curated clothing and gear for the sort of person equally at home in the woods and the city, is having one of the company's rare site-wide sales this week--or pretty close to site-wide. We've tested and love quite a bit of Huckberry's stuff, especially the Proof 72-hour merino T-shirt . If you buy nothing else this year, buy that. Check out the other deals, which we've rounded up below.

artificial intelligence, chatbot, natural language, (14 more...)

WIRED

Country:

North America > United States > California (0.05)
Europe > Slovakia (0.05)
Europe > Czechia (0.05)
Asia > China (0.05)

Industry:

Information Technology (0.70)
Transportation (0.50)
Retail (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (0.48)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)

Neural Information Processing SystemsOct-9-2025, 16:58:27 GMT

0142921fad7ef9192bd87229cdafa9d4-Paper-Conference.pdf

large language model, machine learning, natural language, (20 more...)

Country:

Asia > China > Zhejiang Province > Ningbo (0.14)
Asia > Japan > Shikoku > Kagawa Prefecture > Takamatsu (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry:

Media (0.67)
Leisure & Entertainment (0.67)
Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Communications (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
(4 more...)

arXiv.org Artificial IntelligenceOct-7-2025

Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents

Gao, Heyang, Sun, Zexu, Min, Erxue, Cai, Hengyi, Wang, Shuaiqiang, Yin, Dawei, Chen, Xu

Large Language Models (LLMs) as autonomous agents are increasingly tasked with solving complex, long-horizon problems. Aligning these agents via preference-based offline methods like Direct Preference Optimization (DPO) is a promising direction, yet it faces a critical granularity mismatch. Trajectory-level DPO provides a signal that is too coarse for precise credit assignment, while step-level DPO is often too myopic to capture the value of multi-step behaviors. To resolve this challenge, we introduce Hierarchical Preference Learning (HPL), a hierarchical framework that optimizes LLM agents by leveraging preference signals at multiple, synergistic granularities. While HPL incorporates trajectory- and step-level DPO for global and local policy stability, its core innovation lies in group-level preference optimization guided by a dual-layer curriculum. Our approach first decomposes expert trajectories into semantically coherent action groups and then generates contrasting suboptimal groups to enable preference learning at a fine-grained, sub-task level. Then, instead of treating all preference pairs equally, HPL introduces a curriculum scheduler that organizes the learning process from simple to complex. This curriculum is structured along two axes: the group length, representing sub-task complexity, and the sample difficulty, defined by the reward gap between preferred and dispreferred action groups. Experiments on three challenging agent benchmarks show that HPL outperforms existing state-of-the-art methods. Our analyses demonstrate that the hierarchical DPO loss effectively integrates preference signals across multiple granularities, while the dual-layer curriculum is crucial for enabling the agent to solve a wide range of tasks, from simple behaviors to complex multi-step sequences.

large language model, machine learning, trajectory, (19 more...)

2510.03253

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsOct-1-2025, 21:15:56 GMT

AT ask Details

Table 5: All task variations except shape used in VLMbench. Table 6: All object models used in VLMbench. Object type Number of classes Classes Basic model 3 cube (1), triangular prism (1), cylinder (1)Special model 9 star (1), moon (1), cross (1), flower (1), letter't' (1), pencil (1), basket (1), box container(1), shape sorter (1)Planar model 6 rectangle (1), circle (1), triangle (1), star (1), cross (1), flower (1)Functional model 2 mug (6), sponge (1) Articulated model 2 door with one rotatble handle (2), cabinet with three vertical drawers (3) In the VLMbench, we show eight task categories:"Pick & Place objects", "Stack objects", "Drop When building an instance-level task with one variation, the other variations will also randomly change. For example, in the demonstrations of "Pick & Place objects" In the dataset, we have five types of objects, shown in Table 6. Visualizations can be found on the project website. The object can be placed anywhere with any orientation inside the container. When the detector is triggered, the task considers a success. Instruction T emplates: High-level instructions: "Pick up [target object description] and place it into [target container description]."; Low-level instructions: ("Move to the top of [target object "Move the object into [target container description]; V ariations and scene settings: All objects are randomly changing colors, size, and positions in each demonstration. Color: There are two same-shape objects and two same-shape containers in the scene initialization. All colors are randomly sampled from the color library. The object description is "[color] object"; The container description is "[color] container." Size: There are two same-shape objects and two same-shape containers in the scene initialization. One object and one container are randomly magnified while others are randomly shrunk. Relative Position: There are two same-shape objects and two same-shape containers in the scene initialization. The object description is "[front/rear/left/right] object"; The container description The number of objects varies from two to the length of the object library. High-level instructions: "Stack [below object description] and [above object Low-level instructions: ("Move to the top of [above object description]; "Move the object on [below object description]; Release the Object models: In the seen settings, five object models: star, triangular, cylinder, cube, moon.

artificial intelligence, demonstration, scene initialization, (15 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.48)
Information Technology > Artificial Intelligence > Robots (0.47)

arXiv.org Artificial IntelligenceSep-26-2025

ImaginationPolicy: Towards Generalizable, Precise and Reliable End-to-End Policy for Robotic Manipulation

Lu, Dekun, Gao, Wei, Jia, Kui

End-to-end robot manipulation policies offer significant potential for enabling embodied agents to understand and interact with the world. Unlike traditional modular pipelines, end-to-end learning mitigates key limitations such as information loss between modules and feature misalignment caused by isolated optimization targets. Despite these advantages, existing end-to-end neural networks for robotic manipulation--including those based on large VLM/VLA models--remain insufficiently performant for large-scale practical deployment. In this paper, we take a step towards an end-to-end manipulation policy that is generalizable, accurate and reliable. To achieve this goal, we propose a novel Chain of Moving Oriented Keypoints (CoMOK) formulation for robotic manipulation. Our formulation is used as the action representation of a neural policy, which can be trained in an end-to-end fashion. Such an action representation is general, as it extends the standard end-effector pose action representation and supports a diverse set of manipulation tasks in a unified manner. The oriented keypoint in our method enables natural generalization to objects with different shapes and sizes, while achieving sub-centimeter accuracy. Moreover, our formulation can easily handle multi-stage tasks, multi-modal robot behaviors, and deformable objects. Extensive simulated and hardware experiments demonstrate the effectiveness of our method.

affordance, artificial intelligence, machine learning, (18 more...)

2509.20841

Country:

North America > Canada > Alberta > Census Division No. 13 > Woodlands County (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsAug-15-2025, 05:17:29 GMT

A Appendix A.1 Additional Method Justification The key idea of Q

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Neural Information Processing SystemsAug-7-2025, 00:25:14 GMT

04543a88eae2683133c1acbef5a6bf77-Supplemental-Datasets_and_Benchmarks.pdf

artificial intelligence, container, scene initialization, (16 more...)

Technology: Information Technology > Artificial Intelligence > Robots (0.47)

arXiv.org Artificial IntelligenceAug-6-2025

Point2Act: Efficient 3D Distillation of Multimodal LLMs for Zero-Shot Context-Aware Grasping

Kim, Sang Min, Heo, Hyeongjun, Kim, Junho, Lee, Yonghyeon, Kim, Young Min

We propose Point2Act, which directly retrieves the 3D action point relevant for a contextually described task, leveraging Multimodal Large Language Models (MLLMs). Foundation models opened the possibility for generalist robots that can perform a zero-shot task following natural language descriptions within an unseen environment. While the semantics obtained from large-scale image and language datasets provide contextual understanding in 2D images, the rich yet nuanced features deduce blurry 2D regions and struggle to find precise 3D locations for actions. Our proposed 3D relevancy fields bypass the high-dimensional features and instead efficiently imbue lightweight 2D point-level guidance tailored to the task-specific action. The multi-view aggregation effectively compensates for misalignments due to geometric ambiguities, such as occlusion, or semantic uncertainties inherent in the language descriptions. The output region is highly localized, reasoning fine-grained 3D spatial context that can directly transfer to an explicit position for physical action at the on-the-fly reconstruction of the scene. Our full-stack pipeline, which includes capturing, MLLM querying, 3D reconstruction, and grasp pose extraction, generates spatially grounded responses in under 20 seconds, facilitating practical manipulation tasks. Project page: https://sangminkim-99.github.io/point2act/

artificial intelligence, large language model, natural language, (18 more...)

2508.03099

Country:

Africa > Togo (0.05)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)