Personal
Probing the Gaps in ChatGPT Live Video Chat for Real-World Assistance for People who are Blind or Visually Impaired
Chang, Ruei-Che, Natalie, Rosiana, Xu, Wenqian, Yap, Jovan Zheng Feng, Guo, Anhong
Recent advancements in large multimodal models have provided blind or visually impaired (BVI) individuals with new capabilities to interpret and engage with the real world through interactive systems that utilize live video feeds. However, the potential benefits and challenges of such capabilities to support diverse real-world assistive tasks remain unclear. In this paper, we present findings from an exploratory study with eight BVI participants. Participants used ChatGPT's Advanced Voice with Video, a state-of-the-art live video AI released in late 2024, in various real-world scenarios, from locating objects to recognizing visual landmarks, across unfamiliar indoor and outdoor environments. Our findings indicate that current live video AI effectively provides guidance and answers for static visual scenes but falls short in delivering essential live descriptions required in dynamic situations. Despite inaccuracies in spatial and distance information, participants leveraged the provided visual information to supplement their mobility strategies. Although the system was perceived as human-like due to high-quality voice interactions, assumptions about users' visual abilities, hallucinations, generic responses, and a tendency towards sycophancy led to confusion, distrust, and potential risks for BVI users. Based on the results, we discuss implications for assistive video AI agents, including incorporating additional sensing capabilities for real-world use, determining appropriate intervention timing beyond turn-taking interactions, and addressing ecological and safety concerns.
AnnoSense: A Framework for Physiological Emotion Data Collection in Everyday Settings for AI
Singh, Pragya, Gupta, Ankush, Kumar, Mohan, Singh, Pushpendra
Emotional and mental well-being are vital components of quality of life, and with the rise of smart devices like smartphones, wearables, and artificial intelligence (AI), new opportunities for monitoring emotions in everyday settings have emerged. However, for AI algorithms to be effective, they require high-quality data and accurate annotations. As the focus shifts towards collecting emotion data in real-world environments to capture more authentic emotional experiences, the process of gathering emotion annotations has become increasingly complex. This work explores the challenges of everyday emotion data collection from the perspectives of key stakeholders. We collected 75 survey responses, performed 32 interviews with the public, and 3 focus group discussions (FGDs) with 12 mental health professionals. The insights gained from a total of 119 stakeholders informed the development of our framework, AnnoSense, designed to support everyday emotion data collection for AI. This framework was then evaluated by 25 emotion AI experts for its clarity, usefulness, and adaptability. Lastly, we discuss the potential next steps and implications of AnnoSense for future research in emotion AI, highlighting its potential to enhance the collection and analysis of emotion data in real-world contexts.
A glimpse into OpenAI's largest ambitions
As Will points out, there were two recent wins for OpenAI in its efforts to build AI that outcompetes humans. Its models took second place at a top-level coding competition and--alongside those from Google DeepMind--achieved gold-medal-level results in the 2025 International Math Olympiad. People who believe that AI doesn't pose genuine competition to human-level intelligence might actually take some comfort in that. AI is good at the mathematical and analytical, which are on full display in olympiads and coding competitions. That doesn't mean it's any good at grappling with the messiness of human emotions, making hard decisions, or creating art that resonates with anyone. But that distinction--between machine-like reasoning and the ability to think creatively--is not one OpenAI's heads of research are inclined to make.
How Supercomputing Will Evolve, According to Jack Dongarra
High-performance supercomputing--once the exclusive domain of scientific research--is now a strategic resource for training increasingly complex artificial intelligence models. This convergence of AI and HPC is redefining not only these technologies, but also the ways in which knowledge is produced, and takes a strategic position in the global landscape. To discuss how HPC is evolving, in July WIRED caught up with Jack Dongarra, a US computer scientist who has been a key contributor to the development of HPC software over the past four decades--so much so that in 2021 he earned the prestigious Turing Award. The meeting took place at the 74th Nobel Laureate Meeting in Lindau, Germany, which brought together dozens of Nobel laureates as well as more than 600 emerging scientists from around the world. This interview has been edited for length and clarity.
It's High Time: A Survey of Temporal Question Answering
Piryani, Bhawna, Abdallah, Abdelrahman, Mozafari, Jamshid, Anand, Avishek, Jatowt, Adam
Time plays a critical role in how information is generated, retrieved, and interpreted. In this survey, we provide a comprehensive overview of Temporal Question Answering (TQA), a research area that focuses on answering questions involving temporal constraints or context. As the amount of time-stamped content from sources like news articles, web archives, and knowledge bases increases, systems must address challenges such as detecting temporal intent, normalizing time expressions, ordering events, and reasoning over evolving or ambiguous facts. We focus on recent advances in TQA enabled by neural architectures, especially transformer-based models and Large Language Models (LLMs), highlighting progress in temporal language modeling, retrieval-augmented generation (RAG), and temporal reasoning. We also discuss benchmark datasets and evaluation strategies designed to test temporal robustness, recency awareness, and generalization.
Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning
Understanding internal representations of neural models is a core interest of mechanistic interpretability. Due to its large dimensionality, the representation space can encode various aspects about inputs. To what extent are different aspects organized and encoded in separate subspaces? Is it possible to find these ``natural'' subspaces in a purely unsupervised way? Somewhat surprisingly, we can indeed achieve this and find interpretable subspaces by a seemingly unrelated training objective. Our method, neighbor distance minimization (NDM), learns non-basis-aligned subspaces in an unsupervised manner. Qualitative analysis shows subspaces are interpretable in many cases, and encoded information in obtained subspaces tends to share the same abstract concept across different inputs, making such subspaces similar to ``variables'' used by the model. We also conduct quantitative experiments using known circuits in GPT-2; results show a strong connection between subspaces and circuit variables. We also provide evidence showing scalability to 2B models by finding separate subspaces mediating context and parametric knowledge routing. Viewed more broadly, our findings offer a new perspective on understanding model internals and building circuits.
A comprehensive taxonomy of hallucinations in Large Language Models
Large language models (LLMs) have revolutionized natural language processing, yet their propensity for hallucination, generating plausible but factually incorrect or fabricated content, remains a critical challenge. This report provides a comprehensive taxonomy of LLM hallucinations, beginning with a formal definition and a theoretical framework that posits its inherent inevitability in computable LLMs, irrespective of architecture or training. It explores core distinctions, differentiating between intrinsic (contradicting input context) and extrinsic (inconsistent with training data or reality), as well as factuality (absolute correctness) and faithfulness (adherence to input). The report then details specific manifestations, including factual errors, contextual and logical inconsistencies, temporal disorientation, ethical violations, and task-specific hallucinations across domains like code generation and multimodal applications. It analyzes the underlying causes, categorizing them into data-related issues, model-related factors, and prompt-related influences. Furthermore, the report examines cognitive and human factors influencing hallucination perception, surveys evaluation benchmarks and metrics for detection, and outlines architectural and systemic mitigation strategies. Finally, it introduces web-based resources for monitoring LLM releases and performance. This report underscores the complex, multifaceted nature of LLM hallucinations and emphasizes that, given their theoretical inevitability, future efforts must focus on robust detection, mitigation, and continuous human oversight for responsible and reliable deployment in critical applications.
Jim Acosta 'interviews' AI-generated avatar of deceased teenager promoting gun control message
Jim Acosta and James Carville speculated whether President Trump will try to rig the 2026 midterms in his favor on "The Jim Acosta Show." Liberal journalist Jim Acosta "interviewed" the artificially animated avatar of deceased teenager Joaquin Oliver to promote a gun control message on Monday. Working with the gun control group Change the Ref, founded by Oliver's parents, Acosta had conversation on his Substack with an avatar created by the father of the son, who was killed in the Parkland high school shooting in 2018. He would have turned 25 on Monday. "I would like to know what your solution would be for gun violence," Acosta asked.
North Carolina auditor excited for 'real effect' of state-level DOGE: 'Keeping government accountable'
EXCLUSIVE: North Carolina's state auditor said he is looking forward to making a positive impact on taxpayers by implementing a state version of Department of Government Efficiency (DOGE). In an exclusive interview with Fox News Digital, North Carolina state auditor Dave Boliek said his office would look into how the state government can be more efficient and utilize the resources it has in the "best possible way" for taxpayers. He plans on doing that through House Bill 125, a state-level DOGE initiative named after him that recently passed the legislature. "It helps to give our office and the state auditor's office more resources to take a look at efficiencies and ways to really drill down on determining a good return on investment of taxpayer dollars across North Carolina," Boliek said. "I really support the effort," he said, in part.
Automated Feedback on Student-Generated UML and ER Diagrams Using Large Language Models
Gürtl, Sebastian, Schimetta, Gloria, Kerschbaumer, David, Liut, Michael, Steinmaurer, Alexander
UML and ER diagrams are foundational in computer science education but come with challenges for learners due to the need for abstract thinking, contextual understanding, and mastery of both syntax and semantics. These complexities are difficult to address through traditional teaching methods, which often struggle to provide scalable, personalized feedback, especially in large classes. We introduce DUET (Diagrammatic UML & ER Tutor), a prototype of an LLM-based tool, which converts a reference diagram and a student-submitted diagram into a textual representation and provides structured feedback based on the differences. It uses a multi-stage LLM pipeline to compare diagrams and generate reflective feedback. Furthermore, the tool enables analytical insights for educators, aiming to foster self-directed learning and inform instructional strategies. We evaluated DUET through semi-structured interviews with six participants, including two educators and four teaching assistants. They identified strengths such as accessibility, scalability, and learning support alongside limitations, including reliability and potential misuse. Participants also suggested potential improvements, such as bulk upload functionality and interactive clarification features. DUET presents a promising direction for integrating LLMs into modeling education and offers a foundation for future classroom integration and empirical evaluation.