AITopics | Retail

Collaborating Authors

Retail

Can LLM Agents Simulate Multi-Turn Human Behavior? Evidence from Real Online Customer Behavior Data

Lu, Yuxuan, Huang, Jing, Han, Yan, Yao, Bingsheng, Bei, Sisong, Gesi, Jiri, Xie, Yaochen, Zheshen, null, Wang, null, He, Qi, Wang, Dakuo

arXiv.org Artificial IntelligenceOct-10-2025

Recent research shows that LLM Agents can generate ``believable'' human behaviors via prompt-only methods, and such agents have been increasingly adopted in downstream applications. However, existing evaluation of these agents only focuses on qualitative believability (whether human raters think they are accurate), leaving open questions of whether LLM agents can accurately generate step-by-step actions mimicking a particular human's behavior in a multi-turn interaction task. In this work, we take shopping as a case study and present the first large-scale quantitative evaluation of state-of-the-art LLMs' ability to accurately simulate human behavior. Using real-world data from 31,865 online shopping sessions containing 230,965 user actions, our evaluation reveals that prompt-based LLMs (DeepSeek-R1, Llama, Claude) achieve only 11.86% accuracy in generating human actions, highlighting a substantial gap in actual behavioral accuracy. Through experiments, we also showcase that strategies as simple as fine-tuning LLMs on real human click-through data augmented with synthesized reasoning traces can greatly enhance models' performance. The fine-tuned Qwen2.5-7B achieves 17.26% action generation accuracy and 33.86% F1 score on final purchase prediction, representing substantial improvements of 5.4% and 13.85% over prompt-only baselines. This work establishes the first rigorous benchmark for human behavior simulation and provides actionable insights for developing more accurate LLM agents for future downstream applications.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.20749

Genre: Research Report > New Finding (1.00)

Industry:

Retail > Online (0.68)
Information Technology > Services > e-Commerce Services (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Parameter-Free Federated TD Learning with Markov Noise in Heterogeneous Environments

Naskar, Ankur, Thoppe, Gugan, Negi, Utsav, Gupta, Vijay

arXiv.org Artificial IntelligenceOct-10-2025

Federated learning (FL) can dramatically speed up reinforcement learning by distributing exploration and training across multiple agents. It can guarantee an optimal convergence rate that scales linearly in the number of agents, i.e., a rate of $\tilde{O}(1/(NT)),$ where $T$ is the iteration index and $N$ is the number of agents. However, when the training samples arise from a Markov chain, existing results on TD learning achieving this rate require the algorithm to depend on unknown problem parameters. We close this gap by proposing a two-timescale Federated Temporal Difference (FTD) learning with Polyak-Ruppert averaging. Our method provably attains the optimal $\tilde{O}(1/NT)$ rate in both average-reward and discounted settings--offering a parameter-free FTD approach for Markovian data. Although our results are novel even in the single-agent setting, they apply to the more realistic and challenging scenario of FL with heterogeneous environments.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2510.07436

Country: Asia (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Retail (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

3d158f054ff0cb83397367234899db07-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 23:55:03 GMT

arxiv preprint arxiv, dataset, multimodal llm, (13 more...)

Neural Information Processing Systems

Country:

North America > United States (0.45)
Europe > United Kingdom (0.28)
North America > Canada (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Transportation > Passenger (1.00)
Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
(16 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

Add feedback

Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models

Neural Information Processing SystemsOct-9-2025, 20:39:20 GMT

Work done partially during Yilun's internship at Amazon.

correlation, llm, mmlu, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom (0.04)
Europe > Spain (0.04)
Europe > Italy (0.04)
(4 more...)

Genre: Research Report (0.67)

Industry:

Information Technology > Services > e-Commerce Services (0.70)
Retail > Online (0.56)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Video Token Merging for Long-form Video Understanding Seon-Ho Lee

Neural Information Processing SystemsOct-9-2025, 19:44:42 GMT

Work done during an internship at Amazon Prime Video. Work done while at Amazon Prime Video.

dataset, information, video token, (15 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.93)

Industry: Retail > Online (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.65)

Add feedback

IKEA-Manual: Seeing Shape Assembly Step by Step

Neural Information Processing SystemsOct-9-2025, 16:30:32 GMT

Human-designed visual manuals are crucial components in shape assembly activities.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia (0.04)

Industry:

Retail (0.53)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

We Found 136 of the Best Prime Day Deals Still on for 2025: Up to 55% Off

WIREDOct-9-2025, 16:29:04 GMT

Amazon's fall Prime Day sale has come and gone, but a few of the best deals are still available. All products featured on WIRED are independently selected by our editors. However, we may receive compensation from retailers and/or from purchases of products through these links. Amazon Prime's Latest Prime Day sale has come and gone. If you are a Prime member who missed out, there's some good news--there are some leftover deals still going strong. We're still keeping you updated here with all the best markdowns on our favorite tech gear and gadgets that are still available, from Alexa-enabled speakers to robot vacs to laptops and tablets. The WIRED Reviews team tests products year-round, and at sales events like this, we only recommend deals on stuff we have actually used and approved. We sorted through thousands of deals by hand to make these picks. The Fire HD 10 is Amazon's best tablet for most people . The current model dates from 2023, but the Octa Core processor is plenty fast enough for consuming Amazon Prime content, which is really the primary reason to buy a Fire tablet. The full HD (1080p) screen won't win any awards, but it's good enough for streaming movies. Fire tablets can do double duty as an Echo speaker, too. Turn on Show Mode (swipe down on the notification overlay and check the Show Mode box) and you can query Alexa to your heart's content.

amazon, deal photograph, photograph, (13 more...)

WIRED

Country:

Pacific Ocean > North Pacific Ocean > Puget Sound (0.04)
North America > United States > California (0.04)
Europe > Spain (0.04)
(2 more...)

Industry:

Semiconductors & Electronics (1.00)
Retail > Online (1.00)
Information Technology (1.00)
(2 more...)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)

Add feedback

Estimating Propensity for Causality-based Recommendation without Exposure Data

Neural Information Processing SystemsOct-9-2025, 03:24:56 GMT

They aim to recommend an item based on the uplift, also called the causal effect, in the user's behavior (e.g., clicks or purchases) caused by different treatments (i.e., recommending/exposing the item or not) [

artificial intelligence, machine learning, propensity score, (17 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.93)

Industry: Retail (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

OFCOURSE: A Multi-Agent Reinforcement Learning Environment for Order Fulfillment

Neural Information Processing SystemsOct-8-2025, 20:59:21 GMT

In particular, we model the integrated problem as a Markov game, wherein a team of agents learns a joint policy via interacting with a simulated environment.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Zhejiang Province (0.04)

Genre: Research Report (1.00)

Industry:

Retail (0.68)
Transportation > Freight & Logistics Services (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Osprey backpacks and camping bags hit their lowest prices of the year during Amazon Prime Day

Popular ScienceOct-8-2025, 18:18:33 GMT

Amazon Prime Day is live. See the best deals HERE. PopSci editors are big fans of Osprey outdoor packs and backpacks. Almost all of them are on sale for Prime Day. We may earn revenue from the products available on this page and participate in affiliate programs.

black, blue, tunnel vision grey, (9 more...)

Popular Science

Country: Europe (0.05)

Industry: Retail > Online (0.72)

Technology: Information Technology > Artificial Intelligence (0.72)

Add feedback