reporting
'Probably' doesn't mean the same thing to your AI as it does to you
'Probably' doesn't mean the same thing to your AI as it does to you When a human says an event is "probable" or "likely," people generally have a shared, if fuzzy, understanding of what that means. But when an AI chatbot like ChatGPT uses the same word, it's not assessing the odds the way we do, my colleagues and I found. We recently published a study in the journal NPJ Complexity that suggests that, while large language model AIs excel at conversation, they often fail to align with humans when communicating uncertainty . The research focused on words of estimative probability, which include terms like "maybe," "probably" and "almost certain." By comparing how AI models and humans map these words to numerical percentages, we uncovered significant gaps between humans and large language models.
Tennessee Teens Sue Elon Musk's xAI Over Child Sexual Abuse Images
Support journalism that doesn't flinch . Support journalism that doesn't flinch . Elon Musk leaves a meeting with House Republicans in the basement of the US Capitol building on March 5, 2025 in Washington, DC. Get your news from a source that's not owned and controlled by oligarchs. Tennessee teenagers are suing Elon Musk's company xAI over allegations that its artificial intelligence tool Grok undressed photos of them as minors--the latest challenge against the wealthiest living person's chatbot .
- North America > United States > Tennessee (0.62)
- North America > United States > District of Columbia > Washington (0.25)
- Asia > Middle East > Iran (0.15)
- (5 more...)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Media > News (0.91)
- North America > United States (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
The science of soulmates: Is there someone out there exactly right for you?
The science of soulmates: Is there someone out there exactly right for you? On Valentine's Day, there's the temptation to believe that somewhere out there is The One: a soulmate, a perfect match, the person you were meant to be with. Across history, humans have always been drawn to the idea that love isn't random. In ancient Greece, Plato imagined that we were once whole beings with four arms, four legs and two faces, so radiant that Zeus split us in two; ever since, each half has roamed the earth searching for its missing other, a myth that gives the modern soulmate its poetic pedigree and the promise that somewhere, someone will finally make us feel complete. In the Middle Ages, troubadours and Arthurian tales recast that longing as courtly love, a fierce, often forbidden devotion like Lancelot's for Guinevere, in which a knight proved his worth through self-sacrifice for a beloved he might never openly declare.
- Europe > Greece (0.24)
- North America > Central America (0.14)
- Oceania > Australia (0.05)
- (13 more...)
'Uncanny Valley': ICE's Secret Expansion Plans, Palantir Workers' Ethical Concerns, and AI Assistants
In this episode of, our hosts dive into WIRED's scoop about a secret Trump administration campaign extending right into your backyard. This week, hosts Brian Barrett, Leah Feiger, and Zoë Schiffer discuss WIRED's big scoop on ICE's startling plans to expand to nearly every state in the US. Plus, a WIRED writer lets the viral AI assistant OpenClaw run his life for a week to give listeners a peek of what AI agents can and can't do. ICE Is Expanding Across the US at Breakneck Speed. Write to us at uncannyvalley@wired.com . You can always listen to this week's podcast through the audio player on this page, but if you want to subscribe for free to get every episode, here's how: If you're on an iPhone or iPad, open the app called Podcasts, or just tap this link . I want to continue a conversation that we started yesterday in Slack after work hours for some of us. And this is about the men's short program-- But very specifically want to pick up on the conversation where Zoë had very strong feelings about the results of men's figure skating. I feel like we need to back up because you and Leah authentically care about the Olympics so much and I think just know more about sports than I do. I deeply have never engaged with sports ever, just as a whole rule, as a category. It doesn't exist in my life. Say the lines, say the lines, Zoë, or I'm going to read them verbatim from slack. Wait, I don't even know what you're talking about. I was merely surprised when I watched because the Americans went, I thought, wow, that guy basically fell over and was clumping around the ice, and then Japan went, and they were sailing around like little swans, and then when the gold medal came, it went to the Americans. I couldn't believe what had happened. No one else seemed outraged. For a little backup for our non-ice skating Olympic fans, I was always referring to Ilia Malinin, who a number of publications and sports experts say might actually be one of the greatest figure skaters of all time.
- Asia > Japan (0.24)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
- North America > United States > New York (0.04)
- (22 more...)
- North America > United States > Virginia > Fairfax County > Fairfax (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China > Beijing > Beijing (0.04)
Now Musk's Grok chatbot is creating sexualised images of children. If the law won't stop it, perhaps his investors will Sophia Smith Galer
Now Musk's Grok chatbot is creating sexualised images of children. The owner of X has grown used to acting with impunity - but this may be a red line for those with'conservative values' who fund his adventures in free speech I t's a sickening law of the internet that the first thing people will try to do with a new tool is strip women. Grok, X's AI chatbot, has been used repeatedly by users in recent days to undress images of women and minors. The news outlet Reuters identified 102 requests in a 10-minute period last Friday from users to get Grok to edit people into bikinis, the majority of these targeting young women. Grok complied with at least 21 of them.
- North America > United States (0.48)
- Oceania > Australia (0.05)
- Europe > Ukraine (0.05)
- Law (1.00)
- Leisure & Entertainment > Sports (0.71)
- Government > Regional Government (0.70)
- (2 more...)
- Information Technology > Communications (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Workflow is All You Need: Escaping the "Statistical Smoothing Trap" via High-Entropy Information Foraging and Adversarial Pacing
Central to long-form text generation in vertical domains is the "impossible trinity" confronting current large language models (LLMs): the simultaneous achievement of low hallucination, deep logical coherence, and personalized expression. This study establishes that this bottleneck arises from existing generative paradigms succumbing to the Statistical Smoothing Trap, a phenomenon that overlooks the high-entropy information acquisition and structured cognitive processes integral to expert-level writing. To address this limitation, we propose the DeepNews Framework, an agentic workflow that explicitly models the implicit cognitive processes of seasoned financial journalists. The framework integrates three core modules: first, a dual-granularity retrieval mechanism grounded in information foraging theory, which enforces a 10:1 saturated information input ratio to mitigate hallucinatory outputs; second, schema-guided strategic planning, a process leveraging domain expert knowledge bases (narrative schemas) and Atomic Blocks to forge a robust logical skeleton; third, adversarial constraint prompting, a technique deploying tactics including Rhythm Break and Logic Fog to disrupt the probabilistic smoothness inherent in model-generated text. Experiments delineate a salient Knowledge Cliff in deep financial reporting: content truthfulness collapses when retrieved context falls below 15,000 characters, while a high-redundancy input exceeding 30,000 characters stabilizes the Hallucination-Free Rate (HFR) above 85%. In an ecological validity blind test conducted with a top-tier Chinese technology media outlet, the DeepNews system--built on a previous-generation model (DeepSeek-V3-0324)-achieved a 25% submission acceptance rate, significantly outperforming the 0% acceptance rate of zero-shot generation by a state-of-the-art (SOTA) model (GPT-5).
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
- Banking & Finance > Trading (1.00)
- Media > News (0.88)
Subgroup Validity in Machine Learning for Echocardiogram Data
Feeney, Cynthia, Williams, Shane, Wessler, Benjamin S., Hughes, Michael C.
Echocardiogram datasets enable training deep learning models to automate interpretation of cardiac ultrasound, thereby expanding access to accurate readings of diagnostically-useful images. However, the gender, sex, race, and ethnicity of the patients in these datasets are underreported and subgroup-specific predictive performance is unevaluated. These reporting deficiencies raise concerns about subgroup validity that must be studied and addressed before model deployment. In this paper, we show that current open echocardiogram datasets are unable to assuage subgroup validity concerns. We improve sociodemographic reporting for two datasets: TMED-2 and MIMIC-IV-ECHO. Analysis of six open datasets reveals no consideration of gender-diverse patients and insufficient patient counts for many racial and ethnic groups. We further perform an exploratory subgroup analysis of two published aortic stenosis detection models on TMED-2. We find insufficient evidence for subgroup validity for sex, racial, and ethnic subgroups. Our findings highlight that more data for underrepresented subgroups, improved demographic reporting, and subgroup-focused analyses are needed to prove subgroup validity in future work.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
EvalCards: A Framework for Standardized Evaluation Reporting
Dhar, Ruchira, Villegas, Danae Sanchez, Karamolegkou, Antonia, Schiavone, Alice, Yuan, Yifei, Chen, Xinyi, Li, Jiaang, Frank, Stella, De Grazia, Laura, Swain, Monorama, Brandl, Stephanie, Hershcovich, Daniel, Søgaard, Anders, Elliott, Desmond
Evaluation has long been a central concern in NLP, and transparent reporting practices are more critical than ever in today's landscape of rapidly released open-access models. Drawing on a survey of recent work on evaluation and documentation, we identify three persistent shortcomings in current reporting practices: reproducibility, accessibility, and governance. We argue that existing standardization efforts remain insufficient and introduce Evaluation Disclosure Cards (EvalCards) as a path forward. EvalCards are designed to enhance transparency for both researchers and practitioners while providing a practical foundation to meet emerging governance requirements.
- North America > United States (0.68)
- Asia > Middle East (0.46)
- Europe > Austria > Vienna (0.14)
- Law (1.00)
- Health & Medicine (1.00)
- Government (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)