AITopics | webshop

Collaborating Authors

webshop

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Group-in-Group Policy Optimization for LLMAgent Training

Neural Information Processing SystemsJun-16-2026, 19:43:55 GMT

Recent advances in group-based reinforcement learning (RL) have driven frontier large language models (LLMs) in single-turn tasks like mathematical reasoning. However, their scalability to multi-turn LLM agent training remains limited. Unlike static tasks, agent-environment interactions unfold over many steps and often yield sparse or delayed rewards, making credit assignment across individual steps significantly more challenging. In this work, we propose Group-in-Group Policy Optimization (GiGPO), a novel RL algorithm that achieves fine-grained credit assignment for LLM agents while preserving the appealing properties of group-based RL: critic-free, low memory, and stable convergence. GiGPO introduces a twolevel structure for estimating relative advantage: (i) At the episode-level, GiGPO computes macro relative advantages based on groups of complete trajectories; (ii) At the step-level, GiGPO introduces an anchor state grouping mechanism that retroactively constructs step-level groups by identifying repeated environment states across trajectories. Actions stemming from the same state are grouped together, enabling micro relative advantage estimation.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre:

Workflow (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.92)
Media (0.69)
Education (0.67)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Group-in-Group Policy Optimization for LLM Agent Training

Neural Information Processing SystemsJun-11-2026, 22:10:43 GMT

Recent advances in group-based reinforcement learning (RL) have driven frontier large language models (LLMs) in single-turn tasks like mathematical reasoning. However, their scalability to multi-turn LLM agent training remains limited. Unlike static tasks, agent-environment interactions unfold over many steps and often yield sparse or delayed rewards, making credit assignment across individual steps significantly more challenging. In this work, we propose Group-in-Group Policy Optimization (GiGPO), a novel RL algorithm that achieves fine-grained credit assignment for LLM agents while preserving the appealing properties of group-based RL: critic-free, low memory, and stable convergence. GiGPO introduces a two-level structure for estimating relative advantage: (i) At the episode-level, GiGPO computes macro relative advantages based on groups of complete trajectories; (ii) At the step-level, GiGPO introduces an anchor state grouping mechanism that retroactively constructs step-level groups by identifying repeated environment states across trajectories. Actions stemming from the same state are grouped together, enabling micro relative advantage estimation.

artificial intelligence, large language model, natural language, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

d8efbb5dd415974eb095c3f06bff1f48-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 08:24:46 GMT

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > Michigan (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education > Educational Setting > Online (0.68)
Information Technology > Services (0.46)

Technology:

Information Technology > Communications (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

1 Details about the observation formats Figure 1: Example of the observation of WebShop The observation of WebShop is simplified based on the text_rich

Neural Information Processing SystemsFeb-18-2026, 00:22:25 GMT

The observation of WikiHow is represented in exactly the same way with Zhang et al. [2023]. Table 1: Patterns of WebShop pages Pattern Description search The page to search for an item itemlisting The page listing the search results item The information page of a specific item others The item description page, item feature page, and review pageThe similarity lookup table is defined in Table 2. 1 Table 2: Lookup table of the page similarity of WebShop search itemlisting item others search 1 0 0 0 itemlisting 0 1 0 0 item 0 0 1 0.3 others 0 0 0.3 1 2.2 Lookup table of the instruction similarity function of WikiHow Table 3. Table 3: Patterns of WikiHow instructions Pattern Name Pattern Template search Search an article to learn . . . Owing to the limit of budgets, a subset of only 20 tasks is sampled from the full test set. The visualization is available in Figure 2. It can be seen that the performance of R However, there seems to be a saturation for the performance, which may be attributed to the limited number of the active exemplars and training tasks. The saturation of the average reward comes later than that of the success rate. Double Q-Learning [van Hasselt, 2010] is usually leveraged to ameliorate over-estimation for lookup-based Q-Learning.

artificial intelligence, machine learning, webshop, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces Peter Shaw

Neural Information Processing SystemsFeb-13-2026, 15:36:41 GMT

Much of the previous work towards digital agents for graphical user interfaces (GUIs) has relied on text-based representations (derived from HTML or other structured data sources), which are not always readily available.

demonstration, machine learning, reinforcement learning, (21 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.67)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications (1.00)
(4 more...)

Add feedback

WebShop: Towards Scalable Real-World Web Interactionwith Grounded Language Agents

Neural Information Processing SystemsFeb-10-2026, 08:31:02 GMT

Instruction:I'm looking for a small portable folding desk that is already fully assembled [...][btn] Back to Search [/btn]Page 1 (Total results: 50) [btn] Next [/btn][btn] MENHG Folding Breakfast Tray [...] [/btn]$109.0[btn]

latexit sha1, machine learning, natural language, (12 more...)

Neural Information Processing Systems

Country:

Europe > Middle East > Malta (0.04)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.96)

Add feedback

WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

Neural Information Processing SystemsDec-24-2025, 15:40:31 GMT

Most existing benchmarks for grounding language in interactive environments either lack realistic linguistic elements, or prove difficult to scale up due to substantial human involvement in the collection of data or feedback signals. We develop WebShop - a simulated e-commerce website environment with 1.18 million real-world products and 12,087 crowd-sourced text instructions. In this environment, an agent needs to navigate multiple types of webpages and issue diverse actions to find, customize, and purchase a product given an instruction. WebShop provides several challenges including understanding compositional instructions, query (re-)formulation, dealing with noisy text in webpages, and performing strategic exploration. We collect over 1,600 human trajectories to first validate the benchmark, then train and evaluate a diverse range of agents using reinforcement learning, imitation learning, and pre-trained image and language models. Our best model achieves a task success rate of 29%, which significantly outperforms rule heuristics but is far lower than expert human performance (59%). We also analyze agent and human trajectories and ablate various model components to provide insights for developing future agents with stronger language understanding and decision making abilities. Finally, we show our agent trained on WebShop exhibits non-trivial sim-to-real transfer when evaluated on amazon.com

interaction, name change, webshop, (6 more...)

Neural Information Processing Systems

Industry: Information Technology > Services > e-Commerce Services (0.59)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.76)
Information Technology > Artificial Intelligence > Natural Language (0.59)

Add feedback

DEPO: Dual-Efficiency Preference Optimization for LLM Agents

Chen, Sirui, Zhao, Mengshi, Xu, Lei, Zhao, Yuying, Zhu, Beier, Zhang, Hanwang, Zhao, Shengjie, Lu, Chaochao

arXiv.org Artificial IntelligenceNov-20-2025

Recent advances in large language models (LLMs) have greatly improved their reasoning and decision-making abilities when deployed as agents. Richer reasoning, however, often comes at the cost of longer chain of thought (CoT), hampering interaction efficiency in real-world scenarios. Nevertheless, there still lacks systematic definition of LLM agent efficiency, hindering targeted improvements. To this end, we introduce dual-efficiency, comprising (i) step-level efficiency, which minimizes tokens per step, and (ii) trajectory-level efficiency, which minimizes the number of steps to complete a task. Building on this definition, we propose DEPO, a dual-efficiency preference optimization method that jointly rewards succinct responses and fewer action steps. Experiments on WebShop and BabyAI show that DEPO cuts token usage by up to 60.9% and steps by up to 26.9%, while achieving up to a 29.3% improvement in performance. DEPO also generalizes to three out-of-domain math benchmarks and retains its efficiency gains when trained on only 25% of the data. Our project page is at https://opencausalab.github.io/DEPO.

efficiency, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2511.15392

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

AutoGuide: Automated Generation and Selection of Context-Aware Guidelines for Large Language Model Agents

Neural Information Processing SystemsOct-10-2025, 18:24:56 GMT

Recent advances in large language models (LLMs) have empowered AI agents capable of performing various sequential decision-making tasks.

agent, context-aware guideline, guideline, (13 more...)

Neural Information Processing Systems

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > Michigan (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education > Educational Setting > Online (0.68)
Information Technology > Services (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Filters

Collaborating Authors

webshop

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Group-in-Group Policy Optimization for LLMAgent Training

Group-in-Group Policy Optimization for LLM Agent Training

d8efbb5dd415974eb095c3f06bff1f48-Paper-Conference.pdf

1 Details about the observation formats Figure 1: Example of the observation of WebShop The observation of WebShop is simplified based on the text_rich

From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces Peter Shaw

WebShop: Towards Scalable Real-World Web Interactionwith Grounded Language Agents

WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

DEPO: Dual-Efficiency Preference Optimization for LLM Agents

AutoGuide: Automated Generation and Selection of Context-Aware Guidelines for Large Language Model Agents

f6b22ac37beb5da61efd4882082c9ecd-Supplemental-Conference.pdf