Media
Harmful Traits of AI Companions
Knox, W. Bradley, Bradford, Katie, Castro, Samanta Varela, Ong, Desmond C., Williams, Sean, Romanow, Jacob, Nations, Carly, Stone, Peter, Baker, Samuel
Amid the growing prevalence of human-AI interaction, large language models and other AI-based entities increasingly provide forms of companionship to human users. Such AI companionship -- i.e., bonded relationships between humans and AI systems that resemble the relationships people have with family members, friends, and romantic partners -- might substantially benefit humans. Yet such relationships can also do profound harm. We propose a framework for analyzing potential negative impacts of AI companionship by identifying specific harmful traits of AI companions and speculatively mapping causal pathways back from these traits to possible causes and forward to potential harmful effects. We provide detailed, structured analysis of four potentially harmful traits -- the absence of natural endpoints for relationships, vulnerability to product sunsetting, high attachment anxiety, and propensity to engender protectiveness -- and briefly discuss fourteen others. For each trait, we propose hypotheses connecting causes -- such as misaligned optimization objectives and the digital nature of AI companions -- to fundamental harms -- including reduced autonomy, diminished quality of human relationships, and deception. Each hypothesized causal connection identifies a target for potential empirical evaluation. Our analysis examines harms at three levels: to human partners directly, to their relationships with other humans, and to society broadly. We examine how existing law struggles to address these emerging harms, discuss potential benefits of AI companions, and conclude with design recommendations for mitigating risks. This analysis offers immediate suggestions for reducing risks while laying a foundation for deeper investigation of this critical but understudied topic.
CrossVid: A Comprehensive Benchmark for Evaluating Cross-Video Reasoning in Multimodal Large Language Models
Li, Jingyao, Wang, Jingyun, Tan, Molin, Wang, Haochen, Yan, Cilin, Shi, Likun, Cai, Jiayin, Jiang, Xiaolong, Hu, Yao
Cross-Video Reasoning (CVR) presents a significant challenge in video understanding, which requires simultaneous understanding of multiple videos to aggregate and compare information across groups of videos. Most existing video understanding benchmarks focus on single-video analysis, failing to assess the ability of multimodal large language models (MLLMs) to simultaneously reason over various videos. Recent benchmarks evaluate MLLMs' capabilities on multi-view videos that capture different perspectives of the same scene. However, their limited tasks hinder a thorough assessment of MLLMs in diverse real-world CVR scenarios. To this end, we introduce CrossVid, the first benchmark designed to comprehensively evaluate MLLMs' spatial-temporal reasoning ability in cross-video contexts. Firstly, CrossVid encompasses a wide spectrum of hierarchical tasks, comprising four high-level dimensions and ten specific tasks, thereby closely reflecting the complex and varied nature of real-world video understanding. Secondly, CrossVid provides 5,331 videos, along with 9,015 challenging question-answering pairs, spanning single-choice, multiple-choice, and open-ended question formats. Through extensive experiments on various open-source and closed-source MLLMs, we observe that Gemini-2.5-Pro performs best on CrossVid, achieving an average accuracy of 50.4%. Notably, our in-depth case study demonstrates that most current MLLMs struggle with CVR tasks, primarily due to their inability to integrate or compare evidence distributed across multiple videos for reasoning. These insights highlight the potential of CrossVid to guide future advancements in enhancing MLLMs' CVR capabilities.
Extracting memorized pieces of (copyrighted) books from open-weight language models
Cooper, A. Feder, Gokaslan, Aaron, Ahmed, Ahmed, Cyphert, Amy B., De Sa, Christopher, Lemley, Mark A., Ho, Daniel E., Liang, Percy
Plaintiffs and defendants in copyright lawsuits over generative AI often make sweeping, opposing claims about the extent to which large language models (LLMs) have memorized plaintiffs' protected expression in their training data. Drawing on both machine learning and copyright law, we show that these polarized positions dramatically oversimplify the relationship between memorization and copyright. To do so, we extend a recent probabilistic extraction technique to measure memorization of 50 books in 17 open-weight LLMs. Through thousands of experiments, we show that the extent of memorization varies both by model and by book. With respect to our specific extraction methodology, we find that most LLMs do not memorize most books -- either in whole or in part. However, we also find that Llama 3.1 70B entirely memorizes some books, like the first Harry Potter book and 1984. In fact, the first Harry Potter is so memorized that, using a seed prompt consisting of just the first few tokens of the first chapter, we can deterministically generate the entire book near-verbatim. We discuss why our results have significant implications for copyright cases, though not ones that unambiguously favor either side.
AI-Assisted Conversational Interviewing: Effects on Data Quality and Respondent Experience
Barari, Soubhik, Angbazo, Jarret, Wang, Natalie, Christian, Leah M., Dean, Elizabeth, Slowinski, Zoe, Sepulvado, Brandon
Standardized surveys scale efficiently but sacrifice depth, while conversational interviews improve response quality at the cost of scalability and consistency. This study bridges the gap between these methods by introdu cing a framework for AI - assisted conversational interviewing. To evaluate this framework, we conducted a web survey experiment where 1,800 p articipants were randomly assigned to AI ' chatbots ' which use large language models (LLMs) to dynamically probe respondents for elaboration and interactively code open - ended responses to fixed questions developed by human researchers . We assessed the AI chatbot's performance in terms of coding accuracy, response quality, and respondent experience. Our findings reveal that AI chatbots perform moderately well in live coding even without survey - specific fine - tuning, despite slightly inflated false positive err ors due to respondent acquiescence bias. Open - ended responses were more detailed and informative, but this came at a slight cost to respondent experience. Our findings highlight the feasibility of using AI methods such as chatbots enhanced by LLMs to enhance open - ended data collection in web surveys. 2
RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users
Ye, Suyu, Shi, Haojun, Shih, Darren, Yun, Hyokun, Roosta, Tanya, Shu, Tianmin
To achieve successful assistance with long-horizon web-based tasks, AI agents must be able to sequentially follow real-world user instructions over a long period. Unlike existing web-based agent benchmarks, sequential instruction following in the real world poses significant challenges beyond performing a single, clearly defined task. For instance, real-world human instructions can be ambiguous, require different levels of AI assistance, and may evolve over time, reflecting changes in the user's mental state. To address this gap, we introduce RealWebAssist, a novel benchmark designed to evaluate sequential instruction-following in realistic scenarios involving long-horizon interactions with the web, visual GUI grounding, and understanding ambiguous real-world user instructions. RealWebAssist includes a dataset of sequential instructions collected from real-world human users. Each user instructs a web-based assistant to perform a series of tasks on multiple websites. A successful agent must reason about the true intent behind each instruction, keep track of the mental state of the user, understand user-specific routines, and ground the intended tasks to actions on the correct GUI elements. Our experimental results show that state-of-the-art models struggle to understand and ground user instructions, posing critical challenges in following real-world user instructions for long-horizon web assistance.
Netflix quietly makes major change to platform with no warning as fans rage over 'customer hostile' policy
Trump's R-word slur against Tim Walz costs a crucial GOP vote that could tip DC balance of power Brian Walshe shares jaw-dropping explanation of how his wife'died' and why he chopped up her body Hollywood golden couple with 18 year age-gap spotted during rare outing... can you guess who these stars are? Real estate experts sound alarm over toxic mortgage trap and wave of demolitions across America: Heading to'extinction' Mystery of Nikki Haley's son EXPOSED: Nepo baby explodes on to the scene as America First patriot. But here's what his mother really thinks... Trump's MRI scan results released by White House Mom who spent 10 years'gentle parenting' admits it was a mistake: 'My kids are anxious, insecure and entitled' Is this the END of Ozempic? Ellie Goulding, 38, and Sienna Miller, 43, are pregnant! Nashville neighbors can see what's REALLY going on with Nicole Kidman.
The 110 very best Cyber Monday deals in the US, curated and vetted
These are products we believe are worth purchasing year-round - the discount is just a bonus. These are products we believe are worth purchasing year-round - the discount is just a bonus. Our experts found the best deals and sales that are actually worth your money. The Guardian's journalism is independent. We will earn a commission if you buy something through an affiliate link. The Guardian's journalism is independent. We will earn a commission if you buy something through an affiliate link. We all have holiday traditions we look forward to each year: cooking your grandmother's classic stuffing recipe. With the influx of these so-called "doorbuster deals", it can be hard to know a true steal from a modest markdown. So we've asked shopping experts to curate the best Black Friday and Cyber Monday sales across five of the most-shopped for categories. Whether you're looking for a much-needed sleep upgrade or a cordless vacuum that'll stand the test of time, below is our list of the best Cyber Monday deals across streaming, home, kitchen, tech, travel and wellness products. These are items that normally add up fast but right now are going for prices you won't wince at. To put together our list of deals, we enlisted the help of Guardian contributors with years of experience testing products ranging from blenders to vacuums. Our recommendations are based on items tested and loved by our contributors and staff.
It's last call on Cyber Monday desktop speaker deals, so don't miss out
Gear Audio Speakers It's last call on Cyber Monday desktop speaker deals, so don't miss out You're not going to find better sound on a budget than these deeply discounted desktop speakers. We may earn revenue from the products available on this page and participate in affiliate programs. Ready to retire your tinny TV speakers and sad little laptop drivers? Powered speakers are the easiest way to upgrade your listening station, whether you're streaming playlists, watching movies, or spinning vinyl. And with the right connections, there's no receiver required.
Medieval shipwreck mistaken for underwater 'rubbish'
Science Archaeology Medieval shipwreck mistaken for underwater'rubbish' Loaded with grave slabs, the 13th century English ship was dragged to a grave of its own. Breakthroughs, discoveries, and DIY tips sent every weekday. After centuries at the bottom of the English Channel, remnants from one of England's oldest surviving shipwrecks are finally back on shore. Yet the reason it took maritime archaeologists this long to retrieve items from the 13th century Mortar Wreck was not because of its depth or the ravages of time. The shipwreck was mistaken for modern construction debris.