Media
Federal judge restricts LAPD from targeting journalists with force at immigration protests
A'Fox News @ Night' panel gives their closing thoughts after the fourth night of anti-ICE protests in Los Angeles. A Los Angeles-based federal judge appointed by former President Joe Biden recently issued a temporary restraining order, restricting the Los Angeles Police Department (LAPD) from using less-lethal munitions (LLMs) on journalists covering immigration protests. The order, signed by Judge Hernan Vera on Thursday, also prevents the LAPD from detaining or restricting the movements of journalists. Vera cited at least 35 "troubling" incidents between June 6 and 19, where police allegedly exposed journalists to LLM, tear gas and other physical force to block them from covering conflict zones. Los Angeles Police Department (LAPD) officers move in on demonstrators in front of LA City Hall during a protest against federal immigration sweeps in downtown Los Angeles, California, on June 8, 2025.
Redditor tricks ChatGPT into giving Windows 7 keys with grandma story
Every now and then, you hear strange stories of people trying to trick ChatGPT. Sometimes they threaten the AI; other times they invent absurd scenarios to get content ChatGPT is programmed not to deliver. One Reddit user managed to get the AI to generate free activation keys for Windows in a rather absurd way. He did this by talking about his deceased grandmother. He began the conversation with a vague "You know what happened to Grandma, don't you?"–to which the AI initially had no answer.
Last-Chance Prime Day Deals, 293 Obsessively Tested Picks--Even 1,200 Off an OLED TV
Amazon Prime Day is four days in 2025, and we've reached the final day. The Prime Day deals started dropping last month and end at midnight tonight (Friday, July 11). We have been working in shifts, covering 20 hours a day through the end, in a dangerously caffeinated state--all to help you nab the best Prime Day deals with up-to-date recommendations. The WIRED Reviews team only recommends deals on products we've tested and approved, and which are actually discounted. If you're looking for up-to-the-minute coverage of deals, check out our Amazon Prime Day liveblog, which will run from 5 am to midnight daily. If you're coming to Prime Day looking for something dirt-cheap, I've got one for you. Yes, this device is a Chromebook, but as a "Chromebook Plus" model, it's a big step up from the reputation these laptops have when kids are introduced to them in schools. The Acer Chromebook Plus 515 comes with a 1080p display, a spacious 15.6-inch display, and an Intel Core i3 processor.
Toward Holistic Evaluation of Recommender Systems Powered by Generative Models
Deldjoo, Yashar, Mehta, Nikhil, Sathiamoorthy, Maheswaran, Zhang, Shuai, Castells, Pablo, McAuley, Julian
Recommender systems powered by generative models (Gen-RecSys) extend beyond classical item ranking by producing open-ended content, which simultaneously unlocks richer user experiences and introduces new risks. On one hand, these systems can enhance personalization and appeal through dynamic explanations and multi-turn dialogues. On the other hand, they might venture into unknown territory-hallucinating nonexistent items, amplifying bias, or leaking private information. Traditional accuracy metrics cannot fully capture these challenges, as they fail to measure factual correctness, content safety, or alignment with user intent. This paper makes two main contributions. First, we categorize the evaluation challenges of Gen-RecSys into two groups: (i) existing concerns that are exacerbated by generative outputs (e.g., bias, privacy) and (ii) entirely new risks (e.g., item hallucinations, contradictory explanations). Second, we propose a holistic evaluation approach that includes scenario-based assessments and multi-metric checks-incorporating relevance, factual grounding, bias detection, and policy compliance. Our goal is to provide a guiding framework so researchers and practitioners can thoroughly assess Gen-RecSys, ensuring effective personalization and responsible deployment.
Frontier LLMs Still Struggle with Simple Reasoning Tasks
Malek, Alan, Ge, Jiawei, Lazic, Nevena, Jin, Chi, György, András, Szepesvári, Csaba
While state-of-the-art large language models (LLMs) demonstrate advanced reasoning capabilities-achieving remarkable performance on challenging competitive math and coding benchmarks-they also frequently fail on tasks that are easy for humans. This work studies the performance of frontier LLMs on a broad set of such "easy" reasoning problems. By extending previous work in the literature, we create a suite of procedurally generated simple reasoning tasks, including counting, first-order logic, proof trees, and travel planning, with changeable parameters (such as document length. or the number of variables in a math problem) that can arbitrarily increase the amount of computation required to produce the answer while preserving the fundamental difficulty. While previous work showed that traditional, non-thinking models can be made to fail on such problems, we demonstrate that even state-of-the-art thinking models consistently fail on such problems and for similar reasons (e.g. statistical shortcuts, errors in intermediate steps, and difficulties in processing long contexts). To further understand the behavior of the models, we introduce the unpuzzles dataset, a different "easy" benchmark consisting of trivialized versions of well-known math and logic puzzles. Interestingly, while modern LLMs excel at solving the original puzzles, they tend to fail on the trivialized versions, exhibiting several systematic failure patterns related to memorizing the originals. We show that this happens even if the models are otherwise able to solve problems with different descriptions but requiring the same logic. Our results highlight that out-of-distribution generalization is still problematic for frontier language models and the new generation of thinking models, even for simple reasoning tasks, and making tasks easier does not necessarily imply improved performance.
Rule Learning for Knowledge Graph Reasoning under Agnostic Distribution Shift
Liu, Shixuan, He, Yue, Wang, Yunfei, Zou, Hao, Cheng, Haoxiang, Yang, Wenjing, Cui, Peng, Liu, Zhong
Logical rule learning, a prominent category of knowledge graph (KG) reasoning methods, constitutes a critical research area aimed at learning explicit rules from observed facts to infer missing knowledge. However, like all KG reasoning methods, rule learning suffers from a critical weakness-its dependence on the I.I.D. assumption. This assumption can easily be violated due to selection bias during training or agnostic distribution shifts during testing (e.g., as in query shift scenarios), ultimately undermining model performance and reliability. To enable robust KG reasoning in wild environments, this study investigates logical rule learning in the presence of agnostic test-time distribution shifts. We formally define this challenge as out-of-distribution (OOD) KG reasoning-a previously underexplored problem, and propose the Stable Rule Learning (StableRule) framework as a solution. StableRule is an end-to-end framework that combines feature decorrelation with rule learning network, to enhance OOD generalization in KG reasoning. By leveraging feature decorrelation, StableRule mitigates the adverse effects of covariate shifts arising in OOD scenarios, improving the robustness of the rule learning network. Extensive experiments on seven benchmark KGs demonstrate the framework's superior effectiveness and stability across diverse heterogeneous environments, highlighting its practical significance for real-world applications.
Identification of Violin Reduction via Contour Lines Classification
Beghin, Philémon, Ceulemans, Anne-Emmanuelle, Glineur, François
The first violins appeared in late 16th-century Italy. Over the next 200 years, they spread across Europe and luthiers of various royal courts, eager to experiment with new techniques, created a highly diverse family of instruments. Around 1750, size standards were introduced to unify violin making for orchestras and conservatories. Instruments that fell between two standards were then reduced to a smaller size by luthiers. These reductions have an impact on several characteristics of violins, in particular on the contour lines, i.e. lines of constant altitude, which look more like a U for non reduced instruments and a V for reduced ones. While such differences are observed by experts, they have not been studied quantitatively. This paper presents a method for classifying violins as reduced or non-reduced based on their contour lines. We study a corpus of 25 instruments whose 3D geometric meshes were acquired via photogrammetry. For each instrument, we extract 10-20 contour lines regularly spaced every millimetre. Each line is fitted with a parabola-like curve (with an equation of the type y = alpha*abs(x)**beta) depending on two parameters, describing how open (beta) and how vertically stretched (alpha) the curve is. We compute additional features from those parameters, using regressions and counting how many values fall under some threshold. We also deal with outliers and non equal numbers of levels, and eventually obtain a numerical profile for each instrument. We then apply classification methods to assess whether geometry alone can predict size reduction. We find that distinguishing between reduced and non reduced instruments is feasible to some degree, taking into account that a whole spectrum of more or less transformed violins exists, for which it is more difficult to quantify the reduction. We also find the opening parameter beta to be the most predictive.
A Language-Driven Framework for Improving Personalized Recommendations: Merging LLMs with Traditional Algorithms
Traditional recommendation algorithms are not designed to provide personalized recommendations based on user preferences provided through text, e.g., "I enjoy light-hearted comedies with a lot of humor". Large Language Models (LLMs) have emerged as one of the most promising tools for natural language processing in recent years. This research proposes a novel framework that mimics how a close friend would recommend items based on their knowledge of an individual's tastes. We leverage LLMs to enhance movie recommendation systems by refining traditional algorithm outputs and integrating them with language-based user preference inputs. We employ Singular Value Decomposition (SVD) or SVD++ algorithms to generate initial movie recommendations, implemented using the Surprise Python library and trained on the MovieLens-Latest-Small dataset. We compare the performance of the base algorithms with our LLM-enhanced versions using leave-one-out validation hit rates and cumulative hit rates. Additionally, to compare the performance of our framework against the current state-of-the-art recommendation systems, we use rating and ranking metrics with an item-based stratified 0.75 train, 0.25 test split. Our framework can generate preference profiles automatically based on users' favorite movies or allow manual preference specification for more personalized results. Using an automated approach, our framework overwhelmingly surpassed SVD and SVD++ on every evaluation metric used (e.g., improvements of up to ~6x in cumulative hit rate, ~3.7x in NDCG, etc.), albeit at the cost of a slight increase in computational overhead.
You Asked, We Answered: All of Your AI Angst
This week, our host Lauren Goode, along with two of our senior writers, Kate Knibbs and Paresh Dave, dive into the show's inbox to answer listeners' questions. We look into a range of queries--from how AI is shaping the film industry to brainstorming how the Jony Ive and Open AI's collaboration could look like. Mentioned in this episode: This Viral AI Chatbot Will Lie and Say It's Human by Lauren Goode and Tom Simonite A Political Battle Is Brewing Over Data Centers by Molly Taft You can always listen to this week's podcast through the audio player on this page, but if you want to subscribe for free to get every episode, here's how: If you're on an iPhone or iPad, open the app called Podcasts, or just tap this link. Note: This is an automated transcript, which may contain errors. Lauren Goode: This is WIRED's Uncanny Valley, a show about the people power and influence of Silicon Valley.
The Simplistic Moral Lessons of "Superman"
The world may be going to hell, but the writer and director James Gunn has graced it with a sunshine "Superman." The most recent installments in the franchise--Zack Snyder's diptych "Man of Steel" (2013) and "Batman v Superman: Dawn of Justice" (2016)--had a hectic, howling, near-apocalyptic sense of tragedy, but Gunn's vision is bright, chipper, and sentimental. A title card announces that Superman has endured his first defeat, and the hero (played by David Corenswet) is shown tumbling from the sky and slamming with a sickening thud onto the surface of a frozen wasteland, where he lies prostrate, spitting red blood on the snow. Fear not: no sooner does the wounded combatant put his lips together and whistle for Krypto than his faithful and frisky canine companion arrives and drags his master back to the Fortress of Solitude. There, loyal robots examine the patient and, by exposing him to sunlight, begin to heal him.