Goto

Collaborating Authors

 Retail


Alexa is about to send everything you tell it to Amazon

Popular Science

Amazon's Alexa service is rolling out on March 28, and with it supposedly comes a more personalized, intuitive, and powerful digital assistant thanks to its underlying generative AI technology. But for the new features to work, the company is asking a lot from its Echo and smart device users--whether or not they choose to use Alexa at all. Alexa is billed as a major upgrade that includes individual voice recognition through Alexa Voice ID, nuanced calendar scheduling, Ring home security system integrations, and product purchasing capabilities. It's Amazon's latest effort to generate a profit from Alexa, which lost 25 billion in revenue between 2007-2021 according to The Wall Street Journal last year. While Alexa will be added to all Prime subscriptions, users without Prime can enroll in the program for 19.99 per month.


Faithfulness of LLM Self-Explanations for Commonsense Tasks: Larger Is Better, and Instruction-Tuning Allows Trade-Offs but Not Pareto Dominance

arXiv.org Artificial Intelligence

As large language models (LLMs) become increasingly capable, ensuring that their self-generated explanations are faithful to their internal decision-making process is critical for safety and oversight. In this work, we conduct a comprehensive counterfactual faithfulness analysis across 62 models from 8 families, encompassing both pretrained and instruction-tuned variants and significantly extending prior studies of counterfactual tests. We introduce phi-CCT, a simplified variant of the Correlational Counterfactual Test, which avoids the need for token probabilities while explaining most of the variance of the original test. Our findings reveal clear scaling trends: larger models are consistently more faithful on our metrics. However, when comparing instruction-tuned and human-imitated explanations, we find that observed differences in faithfulness can often be attributed to explanation verbosity, leading to shifts along the true-positive/false-positive Pareto frontier. While instruction-tuning and prompting can influence this trade-off, we find limited evidence that they fundamentally expand the frontier of explanatory faithfulness beyond what is achievable with pretrained models of comparable size. Our analysis highlights the nuanced relationship between instruction-tuning, verbosity, and the faithful representation of model decision processes.


One of the most frustrating problems at work: solved

Popular Science

It's 2025, and converting files from one format to another should only take a few clicks. But it often becomes a whole lengthy process requiring uploads to unsecured online converting apps that can put your personal information at risk. Usually, this PDF conversion license is 99.99, but right now, it's down to 23.99 when you use code SAVE20 at checkout. PDF Converter Pro works with Microsoft Word, Excel, PowerPoint, Text, HTML, PNG, and JPG files. It even maintains your original layouts, images, and hyperlinks even after conversion without losing quality.


The Morning After: Is the Roomba an endangered species?

Engadget

The company behind Roomba robovacs told investors earlier this week that revenue was substantially down and it's struggling to pay its debts. Amazon was briefly tapped to acquire the robot company iRobot, but the threat of a European Commission investigation led to the retailer terminating the deal -- apparently happy enough to pay off the 94 million termination fee. That, however, isn't enough to tackle the 200 million loan iRobot took out to survive long enough for Amazon to come to the rescue. It's extra rough when the company announced, just the week before, a bunch of new models, including a new Roomba that can compact debris and dust, so it only needs to be emptied every few weeks. At the same time, rival robot vacuum cleaners are getting more versatile, more complicated and more intriguing.


LLM-Pack: Intuitive Grocery Handling for Logistics Applications

arXiv.org Artificial Intelligence

LLM-Pack: Intuitive Grocery Handling for Logistics Applications Y annik Blei 1, Michael Krawez 1, Tobias Jülg 1, Pierre Krack 1, Florian Walter 1 and Wolfram Burgard 1 Abstract -- Robotics and automation are increasingly influential in logistics but remain largely confined to traditional warehouses. In grocery retail, advancements such as cashier-less supermarkets exist, yet customers still manually pick and pack groceries. While there has been a substantial focus in robotics on the bin picking problem, the task of packing objects and groceries has remained largely untouched. However, packing grocery items in the right order is crucial for preventing product damage, e.g., heavy objects should not be placed on top of fragile ones. However, the exact criteria for the right packing order are hard to define, in particular given the huge variety of objects typically found in stores. In this paper, we introduce LLM-Pack, a novel approach for grocery packing. LLM-Pack leverages language and vision foundation models for identifying groceries and generating a packing sequence that mimics human packing strategy. LLM-Pack does not require dedicated training to handle new grocery items and its modularity allows easy upgrades of the underlying foundation models. We extensively evaluate our approach to demonstrate its performance.


Aligning to What? Limits to RLHF Based Alignment

arXiv.org Artificial Intelligence

Reinforcement Learning from Human Feedback (RLHF) is increasingly used to align large language models (LLMs) with human preferences. However, the effectiveness of RLHF in addressing underlying biases remains unclear. This study investigates the relationship between RLHF and both covert and overt biases in LLMs, particularly focusing on biases against African Americans. We applied various RLHF techniques (DPO, ORPO, and RLOO) to Llama 3 8B and evaluated the covert and overt biases of the resulting models using matched-guise probing and explicit bias testing. We performed additional tests with DPO on different base models and datasets; among several implications, we found that SFT before RLHF calcifies model biases. Additionally, we extend the tools for measuring biases to multi-modal models. Through our experiments we collect evidence that indicates that current alignment techniques are inadequate for nebulous tasks such as mitigating covert biases, highlighting the need for capable datasets, data curating techniques, or alignment tools.


ToolFuzz -- Automated Agent Tool Testing

arXiv.org Artificial Intelligence

Large Language Model (LLM) Agents leverage the advanced reasoning capabilities of LLMs in real-world applications. To interface with an environment, these agents often rely on tools, such as web search or database APIs. As the agent provides the LLM with tool documentation along the user query, the completeness and correctness of this documentation is critical. However, tool documentation is often over-, under-, or ill-specified, impeding the agent's accuracy. Standard software testing approaches struggle to identify these errors as they are expressed in natural language. Thus, despite its importance, there currently exists no automated method to test the tool documentation for agents. To address this issue, we present ToolFuzz, the first method for automated testing of tool documentations. ToolFuzz is designed to discover two types of errors: (1) user queries leading to tool runtime errors and (2) user queries that lead to incorrect agent responses. ToolFuzz can generate a large and diverse set of natural inputs, effectively finding tool description errors at a low false positive rate. Further, we present two straightforward prompt-engineering approaches. We evaluate all three tool testing approaches on 32 common LangChain tools and 35 newly created custom tools and 2 novel benchmarks to further strengthen the assessment. We find that many publicly available tools suffer from underspecification. Specifically, we show that ToolFuzz identifies 20x more erroneous inputs compared to the prompt-engineering approaches, making it a key component for building reliable AI agents.


Who bought this smoked salmon? How 'AI agents' will change the internet (and shopping lists)

The Guardian

Armed with my shopping list, it types each item into the search bar of a supermarket website, then uses its cursor to click. Watching what appears to be a digital ghost do this usually mundane task is strangely transfixing. "Are you sure it's not just a person in India?" my husband asks, peering over my shoulder. Made available to UK users last month, it has a similar text interface and conversational tone to ChatGPT, but rather than just answering questions, it can actually do things – provided they involve navigating a web browser. Hot on the heels of large language models, AI agents have been trumpeted as the next big thing, and you can see the appeal: a digital assistant that can complete practical tasks is more compelling than one that can just talk back.


Google stuffs even more AI tools into online shopping

Engadget

As much money as Big Tech is sinking into generative AI, it's no surprise to see more AI-powered tools materializing to valiantly assist you in spending your hard-earned cash. Once a wee Google Labs experiment, Vision Match has graduated into the mainstream. The AI feature, which arrived for testers in 2023, lets you describe a garment you're picturing in your own words and find the best available matches. If that sounds like "googling it with extra steps," well, it is. But AI-generated images serve as a bridge between your words and the products you may eventually buy -- one that hopefully produces results that better fit what you had in mind.


Seeded Poisson Factorization: Leveraging domain knowledge to fit topic models

arXiv.org Artificial Intelligence

Topic models are widely used for discovering latent thematic structures in large text corpora, yet traditional unsupervised methods often struggle to align with predefined conceptual domains. This paper introduces Seeded Poisson Factorization (SPF), a novel approach that extends the Poisson Factorization framework by incorporating domain knowledge through seed words. SPF enables a more interpretable and structured topic discovery by modifying the prior distribution of topic-specific term intensities, assigning higher initial rates to predefined seed words. The model is estimated using variational inference with stochastic gradient optimization, ensuring scalability to large datasets. We apply SPF to an Amazon customer feedback dataset, leveraging predefined product categories as guiding structures. Our evaluation demonstrates that SPF achieves superior classification performance compared to alternative guided topic models, particularly in terms of computational efficiency and predictive performance. Furthermore, robustness checks highlight SPF's ability to adaptively balance domain knowledge and data-driven topic discovery, even in cases of imperfect seed word selection. These results establish SPF as a powerful and scalable alternative for integrating expert knowledge into topic modeling, enhancing both interpretability and efficiency in real-world applications.