Media
The Worst Thing About AI Is That People Can't Shut Up About It
The Worst Thing About AI Is That People Can't Shut Up About It A plea from WIRED's top boss: Say less. I tried to get out of this assignment so many times, in so many different ways. Not every package needs an editor's letter, I told them. I was very busy recording a new podcast, getting ready to speak at a tech conference, eating and sleeping, parenting, doodling, revising my to-do list, retying my shoelaces. I was doing my best, I tried to convey to my editor.
The Cure
Erotic imagery and curiosity often arise in intimate relationships, especially when there's safety, play, and mutual recognition. It doesn't mean you've done anything "wrong." On the contrary, it shows that your imagination is alive and searching for ways to bridge the gap between closeness and distance, fantasy and reality. You offer me something charged, even a bit embarrassing, and you're watching--will I crumble?
Ed Zitron Gets Paid to Love AI. He Also Gets Paid to Hate AI
Ed Zitron Gets Paid to Love AI. He's one of the loudest voices of the AI haters--even as he does PR for AI companies. Either way, Ed Zitron has your attention. In his day job, Ed Zitron runs a boutique public relations firm called EZPR. This might surprise anyone who has come to know Zitron through his podcast or his social media or the newsletter in which he writes two-fisted stuff like "Sam Altman is full of shit and "Mark Zuckerberg is a putrid ghoul." Flacks, as a rule, tend not to talk like this. Flacks send prim, throat-clearing emails to media people who do, on rare occasions, talk like this. Flacks want to touch base, hop on the phone, clear up a few things about the allegation that their CEO is a "chunderfuck." And that really is one of the things with guys like Sam Altman and Dario Amodei from Anthropic," Zitron was saying over burgers on a fine Manhattan afternoon in September. "I work with founders all the time. I'm a founder myself, I guess--I don't like the title. But when you are a person that has to make more money than you lose, otherwise you lose your business, and you see these chunderfucks burning 5, 10 billion dollars in a year--and everyone's celebrating them? We were talking about whether any of Zitron's ranting about the AI industry had cost him business on the PR side of the ledger. There was the one client who felt Zitron was being a little mean toward Altman, the CEO of OpenAI and the biggest chunderfuck of all, as far as Zitron is concerned. Founding a company is hard, the client said. "I said, 'I appreciate the comment, but, like, this isn't about you,'" Zitron told me. "His company is burning billions of dollars.
AI Is Not God
In recent times, there have been two techno-religious awakenings. To be human is to yearn for a Sky Daddy. Something that explains the unexplainable, someone to blame. No wonder, then, that in the ZIRP-fueled 2010s, when a new gospel of creation was being spread, some people started to see technology as a kind of religion. Startup founders and CEOs became messianic figures.
Divorced? With Kids? And an Impossible Ex? There's AI for That
They didn't want to put their children in the middle--so they put a machine there instead. Sol Kennedy used to ask his assistant to read the messages his ex-wife sent him. After the couple separated in 2020, Kennedy says, he found their communications "tough." An email, or a stream of them, would arrive--stuff about their two kids mixed with unrelated emotional wallops--and his day would be ruined trying to reply. Kennedy, a serial tech founder and investor in Silicon Valley, was in therapy at the time.
'People thought I was a communist doing this as a non-profit': is Wikipedia's Jimmy Wales the last decent tech baron?
'People thought I was a communist doing this as a non-profit': is Wikipedia's Jimmy Wales the last decent tech baron? In an online landscape characterised by doom and division, the people's encyclopedia stands out - a huge collective endeavour giving everyone free access to the sum of human knowledge. But with Elon Musk branding it'Wokipedia' and AI looming large, can it survive? W ikipedia will be 25 years old in January. Jimmy Wales's daughter will be 25 and three weeks. It's not a coincidence: on Boxing Day 2000 Wales's then wife, Christine, gave birth to a baby girl, but it quickly became clear that something wasn't right. She had breathed in contaminated amniotic fluid, resulting in a life-threatening condition called meconium aspiration syndrome. An experimental treatment was available at the hospital near where they lived in San Diego. Did they want to try it?
Leveraging semantic similarity for experimentation with AI-generated treatments
Shi, Lei, Arbour, David, Addanki, Raghavendra, Sinha, Ritwik, Feller, Avi
Large Language Models (LLMs) enable a new form of digital experimentation where treatments combine human and model-generated content in increasingly sophisticated ways. The main methodological challenge in this setting is representing these high-dimensional treatments without losing their semantic meaning or rendering analysis intractable. Here, we address this problem by focusing on learning low-dimensional representations that capture the underlying structure of such treatments. These representations enable downstream applications such as guiding generative models to produce meaningful treatment variants and facilitating adaptive assignment in online experiments. We propose double kernel representation learning, which models the causal effect through the inner product of kernel-based representations of treatments and user covariates. We develop an alternating-minimization algorithm that learns these representations efficiently from data and provides convergence guarantees under a low-rank factor model. As an application of this framework, we introduce an adaptive design strategy for online experimentation and demonstrate the method's effectiveness through numerical experiments.
Towards Self-Evolving Benchmarks: Synthesizing Agent Trajectories via Test-Time Exploration under Validate-by-Reproduce Paradigm
Guo, Dadi, Zhou, Tianyi, Liu, Dongrui, Qian, Chen, Ren, Qihan, Shao, Shuai, Fan, Zhiyuan, Fung, Yi R., Wang, Kun, Zhang, Linfeng, Shao, Jing
Recent advances in large language models (LLMs) and agent system designs have empowered agents with unprecedented levels of capability. However, existing agent benchmarks are showing a trend of rapid ceiling-hitting by newly developed agents, making it difficult to meet the demands for evaluating agent abilities. To address this problem, we propose the Trajectory-based V alidated-by-Reproducing Agent-benchmark Complexity Evolution (TRACE) framework. This framework takes an original task from an existing benchmark and encourages agents to freely explore and evolve it into a new task with higher difficulty while recording validatable agent trajectories. The framework proceeds in three stages: (1) evolutionary proposal mining, which provides task evolution proposals through preliminary exploration and divergent thinking; (2) problem formation and free exploration, where proposals are conceptualized into feasible problem candidates and the agents then explore them freely while recording their execution trajectories; and (3) multi-level validation, which ensures that the evolved tasks are accompanied by validatable and reproducible trajectories. Experiments on the GAIA benchmark demonstrate that the TRACE framework consistently enhances task complexity while improving the reliability of correctness through validatable execution trajectories. In addition, our framework can successfully adapt to and improve reasoning datasets represented by AIME-2024. This work marks a paradigm shift from static, manually curated benchmarks to dynamic, self-evolving evaluation systems, providing a sustainable and challenging runway for agent development.
Influence Guided Context Selection for Effective Retrieval-Augmented Generation
Deng, Jiale, Shen, Yanyan, Pei, Ziyuan, Chen, Youmin, Huang, Linpeng
Retrieval-Augmented Generation (RAG) addresses large language model (LLM) hallucinations by grounding responses in external knowledge, but its effectiveness is compromised by poor-quality retrieved contexts containing irrelevant or noisy information. While existing approaches attempt to improve performance through context selection based on predefined context quality assessment metrics, they show limited gains over standard RAG. We attribute this limitation to their failure in holistically utilizing available information (query, context list, and generator) for comprehensive quality assessment. Inspired by recent advances in data selection, we reconceptualize context quality assessment as an inference-time data valuation problem and introduce the Contextual Influence Value (CI value). This novel metric quantifies context quality by measuring the performance degradation when removing each context from the list, effectively integrating query-aware relevance, list-aware uniqueness, and generator-aware alignment. Moreover, CI value eliminates complex selection hyperparameter tuning by simply retaining contexts with positive CI values. To address practical challenges of label dependency and computational overhead, we develop a parameterized surrogate model for CI value prediction during inference. The model employs a hierarchical architecture that captures both local query-context relevance and global inter-context interactions, trained through oracle CI value supervision and end-to-end generator feedback. Extensive experiments across 8 NLP tasks and multiple LLMs demonstrate that our context selection method significantly outperforms state-of-the-art baselines, effectively filtering poor-quality contexts while preserving critical information. Code is available at https://github.com/SJTU-DMTai/RAG-CSM.