Retail
Towards Unified Alignment Between Agents, Humans, and Environment
Yang, Zonghan, Liu, An, Liu, Zijun, Liu, Kaiming, Xiong, Fangzhou, Wang, Yile, Yang, Zeyuan, Hu, Qingyuan, Chen, Xinrui, Zhang, Zhenhe, Luo, Fuwen, Guo, Zhicheng, Li, Peng, Liu, Yang
The rapid progress of foundation models has led to the prosperity of autonomous agents, which leverage the universal capabilities of foundation models to conduct reasoning, decision-making, and environmental interaction. However, the efficacy of agents remains limited when operating in intricate, realistic environments. In this work, we introduce the principles of $\mathbf{U}$nified $\mathbf{A}$lignment for $\mathbf{A}$gents ($\mathbf{UA}^2$), which advocate for the simultaneous alignment of agents with human intentions, environmental dynamics, and self-constraints such as the limitation of monetary budgets. From the perspective of $\mathbf{UA}^2$, we review the current agent research and highlight the neglected factors in existing agent benchmarks and method candidates. We also conduct proof-of-concept studies by introducing realistic features to WebShop, including user profiles to demonstrate intentions, personalized reranking for complex environmental dynamics, and runtime cost statistics to reflect self-constraints. We then follow the principles of $\mathbf{UA}^2$ to propose an initial design of our agent, and benchmark its performance with several candidate baselines in the retrofitted WebShop. The extensive experimental results further prove the importance of the principles of $\mathbf{UA}^2$. Our research sheds light on the next steps of autonomous agent research with improved general problem-solving abilities.
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Lù, Xing Han, Kasner, Zdeněk, Reddy, Siva
We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve real-world tasks in a multi-turn dialogue fashion. To support this problem, we introduce WEBLINX - a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. Our benchmark covers a broad range of patterns on over 150 real-world websites and can be used to train and evaluate agents in diverse scenarios. Due to the magnitude of information present, Large Language Models (LLMs) cannot process entire web pages in real-time. To solve this bottleneck, we design a retrieval-inspired model that efficiently prunes HTML pages by ranking relevant elements. We use the selected elements, along with screenshots and action history, to assess a variety of models for their ability to replicate human behavior when navigating the web. Our experiments span from small text-only to proprietary multimodal LLMs. We find that smaller finetuned decoders surpass the best zero-shot LLMs (including GPT-4V), but also larger finetuned multimodal models which were explicitly pretrained on screenshots. However, all finetuned models struggle to generalize to unseen websites. Our findings highlight the need for large multimodal models that can generalize to novel settings. Our code, data and models are available for research: https://mcgill-nlp.github.io/weblinx
Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models
Koa, Kelvin J. L., Ma, Yunshan, Ng, Ritchie, Chua, Tat-Seng
Explaining stock predictions is generally a difficult task for traditional non-generative deep learning models, where explanations are limited to visualizing the attention weights on important texts. Today, Large Language Models (LLMs) present a solution to this problem, given their known capabilities to generate human-readable explanations for their decision-making process. However, the task of stock prediction remains challenging for LLMs, as it requires the ability to weigh the varying impacts of chaotic social texts on stock prices. The problem gets progressively harder with the introduction of the explanation component, which requires LLMs to explain verbally why certain factors are more important than the others. On the other hand, to fine-tune LLMs for such a task, one would need expert-annotated samples of explanation for every stock movement in the training set, which is expensive and impractical to scale. To tackle these issues, we propose our Summarize-Explain-Predict (SEP) framework, which utilizes a self-reflective agent and Proximal Policy Optimization (PPO) to let a LLM teach itself how to generate explainable stock predictions in a fully autonomous manner. The reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations from input texts. The training samples for the PPO trainer are also the responses generated during the reflective process, which eliminates the need for human annotators. Using our SEP framework, we fine-tune a LLM that can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient for the stock classification task. To justify the generalization capability of our framework, we further test it on the portfolio construction task, and demonstrate its effectiveness through various portfolio metrics.
An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market
John, Jeen Mary, Shobayo, Olamilekan, Ogunleye, Bayode
Recently, peoples awareness of online purchases has significantly risen. This has given rise to online retail platforms and the need for a better understanding of customer purchasing behaviour. Retail companies are pressed with the need to deal with a high volume of customer purchases, which requires sophisticated approaches to perform more accurate and efficient customer segmentation. Customer segmentation is a marketing analytical tool that aids customer-centric service and thus enhances profitability. In this paper, we aim to develop a customer segmentation model to improve decision-making processes in the retail market industry. To achieve this, we employed a UK-based online retail dataset obtained from the UCI machine learning repository. The retail dataset consists of 541,909 customer records and eight features. Our study adopted the RFM (recency, frequency, and monetary) framework to quantify customer values. Thereafter, we compared several state-of-the-art (SOTA) clustering algorithms, namely, K-means clustering, the Gaussian mixture model (GMM), density-based spatial clustering of applications with noise (DBSCAN), agglomerative clustering, and balanced iterative reducing and clustering using hierarchies (BIRCH). The results showed the GMM outperformed other approaches, with a Silhouette Score of 0.80.
V-IRL: Grounding Virtual Intelligence in Real Life
Yang, Jihan, Ding, Runyu, Brown, Ellis, Qi, Xiaojuan, Xie, Saining
There is a sensory gulf between the Earth that humans inhabit and the digital realms in which modern AI agents are created. To develop AI agents that can sense, think, and act as flexibly as humans in real-world settings, it is imperative to bridge the realism gap between the digital and physical worlds. How can we embody agents in an environment as rich and diverse as the one we inhabit, without the constraints imposed by real hardware and control? Towards this end, we introduce V-IRL: a platform that enables agents to scalably interact with the real world in a virtual yet realistic environment. Our platform serves as a playground for developing agents that can accomplish various practical tasks and as a vast testbed for measuring progress in capabilities spanning perception, decision-making, and interaction with real-world data across the entire globe.
8 Best Valentine's Day Deals: Headphones, MacBooks, and a Lego Kit
Whether you're shopping for your significant other, your kid, or yourself, Valentine's Day is the perfect time to pick up some little treats. We've found a few great sales happening right now on some of our favorite gadgets and gizmos, which also make great gifts. Be sure to check our deals from earlier in the week, which are still on sale, including discounted Prana apparel, plus robot vacuums and smartwatches. Special offer for Gear readers: Get WIRED for just 5 ( 25 off). This includes unlimited access to WIRED.com, full Gear coverage, and subscriber-only newsletters.
I tried Apple's 3,500 Vision Pro... and it let me attend an Alicia Keys recording session, walk a tightrope and pet a dinosaur without leaving a New York City store - but it wasn't worth 5 times the price of Meta's Oculus headset
I attended an Alicia Keys recording session, walked a tightrope and pet a dinosaur without leaving an Apple Store in New York City. The tech giant officially launched the new 3,500 Vision Pro in-store Friday and I was able to try out the headset this morning, along with dozens of others who waited patiently outside before the doors opened. But unlike the crowd, I was not expecting to enjoy the experience. I am somewhat of a Luddite and just could not imagine needing or wanting a mixed-reality headset - much less one that costs more than my month's rent in Brooklyn. However, each experience during the 20 minute demo felt shockingly intimate, vivid, and up close - I actually felt self-conscious about maybe having haphazardly stumbled my way into multiple dangerous or V.I.P. locations that I wasn't supposed to be in.
Amazon profits surge on strong trading season and cloud computing growth
Profits at Amazon have surged on strong seasonal trading and robust growth in its powerhouse cloud computing business. The world's largest retailer generated revenue of 170bn in the three months to December, up 14% on the same period of 2022, and clearing expectations on Wall Street of some 166bn. Net income hit 10.6bn in the fourth quarter, from 278m a year previously, after the company moved to cut costs and draw a line under years of rapid expansion following the onset of the pandemic. Earnings per share hit 1.03. Shares in the business rose 5.5% during out-of-hours trading in New York.
Amazon launches Rufus, an AI-powered shopping assistant
Amazon launched a new generative AI shopping assistant, Rufus, on Thursday. The chatbot is trained on Amazon's product catalog, customer reviews, community Q&As and "information from across the web." It's only available to a limited set of Amazon customers for now but will expand in the coming weeks. The company views the assistant as customers' one-stop shop for all their shopping needs. Rufus can answer questions like, "What to consider when buying running shoes?" and display comparisons for things such as, "What are the differences between trail and road running shoes?"
Anthropomorphism and Human-Robot Interaction
Robots are fast becoming a part of everyday life. Indeed, robots are now deployed in retail stores (see Figure 1), warehouses, hospitals, factories, and so on to perform tasks conventionally done by humans. Nestlé uses a humanoid robot "Pepper" to sell coffee makers in department stores in Japan; people buy ice cream from a fully automated ice cream franchise, RoboFusion; Cobalt's KnightScope security robots patrol streets in New York City. Such encounters will only increase as the global market for service robots has grown exponentially, from 36.2 billion in 2022 to 103.3 billion by 2026.26 Humanoid'Pepper' has been deployed in many retail stores throughout Japan.20 According to a survey from McKinsey Global Institute, 15% of the global workforce, or 400 million workers, will be displaced by 2030.14