Goto

Collaborating Authors

 Generative AI


ChatGPT's new AI search beats Google in this one thing

PCWorld

OpenAI's ChatGPT has removed the last barrier to using ChatGPT as a search engine, the requirement to log in. OpenAI launched the feature last fall, but required a login. Now, the feature can be used without the need for registration. ChatGPT Search is essentially just ChatGPT, and can be accessed at ChatGPT.com. But below the "Message ChatGPT" box you'll see a small icon called "Search" that can be clicked.


OpenAI co-founder John Schulman has left Anthropic after less than a year

Engadget

Less than a year into his tenure at the company, OpenAI co-founder John Schulman is leaving Anthropic. The startup confirmed Schulman's departure after The Information, Reuters and other publications reported on the exit. "We are sad to see John go but fully support his decision to pursue new opportunities and wish him all the very best," said Jared Kaplan, Anthropic's chief science officer, in a statement the company shared with Engadget. Schulman left OpenAI last August alongside Peter Deng, the company's former vice-president of consumer product. Schulman is considered one of the original architects of ChatGPT.


Explained: Generative AI's environmental impact

AIHub

In a two-part series, MIT News explores the environmental implications of generative AI. In this article, we look at why this technology is so resource-intensive. A second piece will investigate what experts are doing to reduce genAI's carbon footprint and other impacts. The excitement surrounding potential benefits of generative AI, from improving worker productivity to advancing scientific research, is hard to ignore. While the explosive growth of this new technology has enabled rapid deployment of powerful models in many industries, the environmental consequences of this generative AI "gold rush" remain difficult to pin down, let alone mitigate.


Reframing digital transformation through the lens of generative AI

MIT Technology Review

Enterprise adoption of generative AI technologies has undergone explosive growth in the last two years and counting. Powerful solutions underpinned by this new generation of large language models (LLMs) have been used to accelerate research, automate content creation, and replace clunky chatbots with AI assistants and more sophisticated AI agents that closely mimic human interaction. "In 2023 and the first part of 2024, we saw enterprises experimenting, trying out new use cases to see, 'What can this new technology do for me?'" explains Arthy Krishnamurthy, senior director for business transformation at Dataiku. But while many organizations were eager to adopt and exploit these exciting new capabilities, some may have underestimated the need to thoroughly scrutinize AI-related risks and recalibrate existing frameworks and forecasts for digital transformation.


Amazon set to release long-delayed Alexa generative AI revamp

The Japan Times

Amazon is set to release its long-awaited -- and delayed -- Alexa generative artificial intelligence voice service, said three people familiar with the matter, and has scheduled a media event for later this month to preview it. Once released, it would mark the most significant upgrade to the product since its initial introduction accelerated a wave of digital assistants more than a decade ago. Amazon on Wednesday sent media invites to an event to be held on Feb. 26 in New York featuring the head of its devices and services team, Panos Panay. A spokesperson said the event is Alexa-focused, while declining to elaborate.


Integrating Generative Artificial Intelligence in ADRD: A Framework for Streamlining Diagnosis and Care in Neurodegenerative Diseases

arXiv.org Artificial Intelligence

Healthcare systems are struggling to meet the growing demand for neurological care, with challenges particularly acute in Alzheimer's disease and related dementias (ADRD). While artificial intelligence research has often focused on identifying patterns beyond human perception, implementing such predictive capabilities remains challenging as clinicians cannot readily verify insights they cannot themselves detect. We propose that large language models (LLMs) offer more immediately practical applications by enhancing clinicians' capabilities in three critical areas: comprehensive data collection, interpretation of complex clinical information, and timely application of relevant medical knowledge. These challenges stem from limited time for proper diagnosis, growing data complexity, and an overwhelming volume of medical literature that exceeds any clinician's capacity to fully master. We present a framework for responsible AI integration that leverages LLMs' ability to communicate effectively with both patients and providers while maintaining human oversight. This approach prioritizes standardized, high-quality data collection to enable a system that learns from every patient encounter while incorporating the latest clinical evidence, continuously improving care delivery. We begin to address implementation challenges and initiate important discussions around ethical considerations and governance needs. While developed for ADRD, this roadmap provides principles for responsible AI integration across neurology and other medical specialties, with potential to improve diagnostic accuracy, reduce care disparities, and advance clinical knowledge through a learning healthcare system.


The Cake that is Intelligence and Who Gets to Bake it: An AI Analogy and its Implications for Participation

arXiv.org Artificial Intelligence

In a widely popular analogy by Turing Award Laureate Yann LeCun, machine intelligence has been compared to cake --where unsupervised learning forms the base, supervised learning adds the icing, and reinforcement learning is the cherry on top. We expand this "cake that is intelligence" analogy from a simple structural metaphor to the full life-cycle of AI systems, extending it to sourcing of ingredients (data), conception of recipes (instructions), the baking process (training), and the tasting and selling of the cake (evaluation and distribution). Leveraging our re-conceptualization, we describe each step's entailed social ramifications and how they are bounded by statistical assumptions within machine learning. Whereas these technical foundations and social impacts are deeply intertwined, they are often studied in isolation, creating barriers that restrict meaningful participation. Our re-conceptualization paves the way to bridge this gap by mapping where technical foundations interact with social outcomes, highlighting opportunities for cross-disciplinary dialogue. Finally, we conclude with actionable recommendations at each stage of the metaphorical AI cake's life-cycle, empowering prospective AI practitioners, users, and researchers, with increased awareness and ability to engage in broader AI discourse.


Decoding AI Judgment: How LLMs Assess News Credibility and Bias

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly used to assess news credibility, yet little is known about how they make these judgments. While prior research has examined political bias in LLM outputs or their potential for automated fact-checking, their internal evaluation processes remain largely unexamined. Understanding how LLMs assess credibility provides insights into AI behavior and how credibility is structured and applied in large-scale language models. This study benchmarks the reliability and political classifications of state-of-the-art LLMs - Gemini 1.5 Flash (Google), GPT-4o mini (OpenAI), and LLaMA 3.1 (Meta) - against structured, expert-driven rating systems such as NewsGuard and Media Bias Fact Check. Beyond assessing classification performance, we analyze the linguistic markers that shape LLM decisions, identifying which words and concepts drive their evaluations. We uncover patterns in how LLMs associate credibility with specific linguistic features by examining keyword frequency, contextual determinants, and rank distributions. Beyond static classification, we introduce a framework in which LLMs refine their credibility assessments by retrieving external information, querying other models, and adapting their responses. This allows us to investigate whether their assessments reflect structured reasoning or rely primarily on prior learned associations.


Illuminating Spaces: Deep Reinforcement Learning and Laser-Wall Partitioning for Architectural Layout Generation

arXiv.org Artificial Intelligence

Space layout design (SLD), occurring in the early stages of the design process, nonetheless influences both the functionality and aesthetics of the ultimate architectural outcome. The complexity of SLD necessitates innovative approaches to efficiently explore vast solution spaces. While image-based generative AI has emerged as a potential solution, they often rely on pixel-based space composition methods that lack intuitive representation of architectural processes. This paper leverages deep Reinforcement Learning (RL), as it offers a procedural approach that intuitively mimics the process of human designers. Effectively using RL for SLD requires an explorative space composing method to generate desirable design solutions. We introduce "laser-wall", a novel space partitioning method that conceptualizes walls as emitters of imaginary light beams to partition spaces. This approach bridges vector-based and pixel-based partitioning methods, offering both flexibility and exploratory power in generating diverse layouts. We present two planning strategies: one-shot planning, which generates entire layouts in a single pass, and dynamic planning, which allows for adaptive refinement by continuously transforming laser-walls. Additionally, we introduce on-light and off-light wall transformations for smooth and fast layout refinement, as well as identity-less and identity-full walls for versatile room assignment. We developed SpaceLayoutGym, an open-source OpenAI Gym compatible simulator for generating and evaluating space layouts. The RL agent processes the input design scenarios and generates solutions following a reward function that balances geometrical and topological requirements. Our results demonstrate that the RL-based laser-wall approach can generate diverse and functional space layouts that satisfy both geometric constraints and topological requirements and is architecturally intuitive.


BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation

arXiv.org Artificial Intelligence

Large language models (LLMs), such as o1 from OpenAI, have demonstrated remarkable reasoning capabilities. o1 generates a long chain-of-thought (LongCoT) before answering a question. LongCoT allows LLMs to analyze problems, devise plans, reflect, and backtrack effectively. These actions empower LLM to solve complex problems. After the release of o1, many teams have attempted to replicate its LongCoT and reasoning capabilities. In terms of methods, they primarily rely on knowledge distillation with data from existing models with LongCoT capacities (e.g., OpenAI-o1, Qwen-QwQ, DeepSeek-R1-Preview), leaving significant uncertainties on systematically developing such reasoning abilities. In terms of data domains, these works focus narrowly on math while a few others include coding, limiting their generalizability. This paper introduces a novel approach to enable LLM's LongCoT capacity without distillation from o1-like models or expensive human annotations, where we bootstrap LongCoT (BOLT) from a standard instruct model. BOLT involves three stages: 1) LongCoT data bootstrapping with in-context learning on a standard instruct model; 2) LongCoT supervised finetuning; 3) online training to further refine LongCoT capacities. In BOLT, only a few in-context examples need to be constructed during the bootstrapping stage; in our experiments, we created 10 examples, demonstrating the feasibility of this approach. We use Llama-3.1-70B-Instruct to bootstrap LongCoT and apply our method to various model scales (7B, 8B, 70B). We achieve impressive performance on a variety of benchmarks, Arena-Hard, MT-Bench, WildBench, ZebraLogic, MATH500, which evaluate diverse task-solving and reasoning capabilities.