Generative AI
Generative AI in Sociological Research: State of the Discipline
Alvero, AJ, Stoltz, Dustin S., Stuhler, Oscar, Taylor, Marshall
Generative artificial intelligence (GenAI) has garnered considerable attention for its potential utility in research and scholarship. A growing body of work in sociology and related fields demonstrates both the potential advantages and risks of GenAI, but these studies are largely proof-of-concept or specific audits of models and products. We know comparatively little about how sociologists actually use GenAI in their research practices and how they view its present and future role in the discipline. In this paper, we describe the current landscape of GenAI use in sociological research based on a survey of authors in 50 sociology journals. Our sample includes both computational sociologists and non-computational sociologists and their collaborators. We find that sociologists primarily use GenAI to assist with writing tasks: revising, summarizing, editing, and translating their own work. Respondents report that GenAI saves time and that they are curious about its capabilities, but they do not currently feel strong institutional or field-level pressure to adopt it. Overall, respondents are wary of GenAI's social and environmental impacts and express low levels of trust in its outputs, but many believe that GenAI tools will improve over the next several years. We do not find large differences between computational and non-computational scholars in terms of GenAI use, attitudes, and concern; nor do we find strong patterns by familiarity or frequency of use. We discuss what these findings suggest about the future of GenAI in sociology and highlight challenges for developing shared norms around its use in research practice.
Geometric Uncertainty for Detecting and Correcting Hallucinations in LLMs
Phillips, Edward, Wu, Sean, Molaei, Soheila, Belgrave, Danielle, Thakur, Anshul, Clifton, David
Large language models demonstrate impressive results across diverse tasks but are still known to hallucinate, generating linguistically plausible but incorrect answers to questions. Uncertainty quantification has been proposed as a strategy for hallucination detection, requiring estimates for both global uncertainty (attributed to a batch of responses) and local uncertainty (attributed to individual responses). While recent black-box approaches have shown some success, they often rely on disjoint heuristics or graph-theoretic approximations that lack a unified geometric interpretation. We introduce a geometric framework to address this, based on archetypal analysis of batches of responses sampled with only black-box model access. At the global level, we propose Geometric V olume, which measures the convex hull volume of archetypes derived from response embeddings. At the local level, we propose Geometric Suspicion, which leverages the spatial relationship between responses and these archetypes to rank reliability, enabling hallucination reduction through preferential response selection. Unlike prior methods that rely on discrete pairwise comparisons, our approach provides continuous semantic boundary points which have utility for attributing reliability to individual responses. Experiments show that our framework performs comparably to or better than prior methods on short form question-answering datasets, and achieves superior results on medical datasets where hallucinations carry particularly critical risks. We also provide theoretical justification by proving a link between convex hull volume and entropy. Large language models (LLMs) have achieved remarkable performance across diverse natural language processing tasks (Guo et al., 2025; Anthropic, 2025; Gemini Team, Google DeepMind, 2025; OpenAI, 2025) and are increasingly applied in areas such as medical diagnosis, law, and financial advice (Y ang et al., 2025; Chen et al., 2024; Kong et al., 2024). Hallucinations, however, where models generate plausible but false or fabricated content, pose significant risks for adoption in high-stakes applications (Farquhar et al., 2024). Recent work, for example, finds GPT -4 hallucinating in 28.6% of reference generation tasks (Chelli et al., 2024).
Hey AI, Generate Me a Hardware Code! Agentic AI-based Hardware Design & Verification
Gadde, Deepak Narayan, Radhakrishna, Keerthan Kopparam, Viswambharan, Vaisakh Naduvodi, Kumar, Aman, Lettnin, Djones, Kunz, Wolfgang, Simon, Sebastian
Modern Integrated Circuits (ICs) are becoming increasingly complex, and so is their development process. Hardware design verification entails a methodical and disciplined approach to the planning, development, execution, and sign-off of functionally correct hardware designs. This tedious process requires significant effort and time to ensure a bug-free tape-out. The field of Natural Language Processing has undergone a significant transformation with the advent of Large Language Models (LLMs). These powerful models, often referred to as Generative AI (GenAI), have revolutionized how machines understand and generate human language, enabling unprecedented advancements in a wide array of applications, including hardware design verification. This paper presents an agentic AI-based approach to hardware design verification, which empowers AI agents, in collaboration with Humain-in-the-Loop (HITL) intervention, to engage in a more dynamic, iterative, and self-reflective process, ultimately performing end-to-end hardware design and verification. This methodology is evaluated on five open-source designs, achieving over 95% coverage with reduced verification time while demonstrating superior performance, adaptability, and configurability.
Sam Altman issues 'code red' at OpenAI as ChatGPT contends with rivals
Sam Altman, OpenAI's chief executive, sent an internal memo to staff saying Gemini 3 could create'temporary economic headwinds' for the company. Sam Altman, OpenAI's chief executive, sent an internal memo to staff saying Gemini 3 could create'temporary economic headwinds' for the company. Sam Altman issues'code red' at OpenAI as ChatGPT contends with rivals Chief executive tells staff it is'critical time' for chatbot as it faces intense competition from Google's new Gemini 3 Sam Altman has declared a "code red" at OpenAI to improve ChatGPT as the chatbot faces intense competition from rivals. According to a report by tech news site the Information, the chief executive of the San Francisco-based startup told staff in an internal memo: "We are at a critical time for ChatGPT." OpenAI has been rattled by the success of Google's latest AI model, Gemini 3, and is devoting more internal resources to improving ChatGPT .
Amazon Has New Frontier AI Models--and a Way for Customers to Build Their Own
Nova Forge lets Amazon's customers train frontier models for different tasks--a potential breakthrough in making AI actually useful for businesses. Amazon has announced a new family of frontier artificial intelligence models--and a new way for customers to build frontier models of their own. The ecommerce giant announced the second generation of its Nova AI models at re:Invent, a company conference held in Las Vegas. The models are nowhere near as popular as those offered by rivals like OpenAI and Google, but Amazon's plan to make them highly customizable could see them gain traction with its cloud users. Amazon detailed two improved large language models, Nova Lite and Nova Pro; a new realtime voice model called Nova Sonic; and a more experimental model called Nova Omni that performs a simulated kind of reasoning using images, audio, and video as well as text.
The CEO Who Believes AGI Is Already Here
Welcome back to, TIME's new twice-weekly newsletter about AI. If you're reading this in your browser, why not subscribe to have the next one delivered straight to your inbox? The three most valuable private companies in the U.S. have big reputations: OpenAI, SpaceX, and Anthropic. But the fourth, Databricks, flies a little more under the radar. This company, which is currently raising funds at a valuation of $134 billion according to reports this week, is a quiet workhorse of the AI revolution.
The Download: AI's impact on the economy, and DeepSeek strikes again
Any far-reaching new technology is always uneven in its adoption, but few have been more uneven than generative AI. That makes it hard to assess its likely impact on individual businesses, let alone on productivity across the economy as a whole. At one extreme, AI coding assistants have revolutionized the work of software developers. At the other extreme, most companies are seeing little if any benefit from their initial investments. That has provided fuel for the skeptics who maintain that--by its very nature as a probabilistic technology prone to hallucinating--generative AI will never have a deep impact on business. To students of tech history, though, the lack of immediate impact is normal.
SoftBank's Son 'cried' about Nvidia stake sale to fund AI bets
Masayoshi Son, chairman and chief executive officer of SoftBank Group, speaks during the Future Investment Initiative (FII) Institute Priority Asia conference in Tokyo on Monday. SoftBank Group founder Masayoshi Son said he wouldn't have sold off Nvidia shares if his company had unlimited money to bankroll its next investments in artificial intelligence, which include a big bet on OpenAI. Son, addressing for the first time the surprise November disclosure that SoftBank had unloaded its entire stake in the world's most valuable company, also slammed talk of an AI investment bubble. The Japanese company simply needed to raise capital to fund projects including data center construction, he told a forum in Tokyo Monday. I don't want to sell a single share. I just had more need for money to invest in OpenAI" and other projects, Son said during the FII Priority Asia forum.
Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models
Candello, Heloisa, Azmat, Muneeza, Gunturi, Uma Sushmitha, Horesh, Raya, de Paula, Rogerio Abreu, Pimentel, Heloisa, Grave, Marcelo Carpinette, Adebiyi, Aminat, Machado, Tiago, de Macedo, Maysa Malfiza Garcia
With the rapid uptake of generative AI, investigating human perceptions of generated responses has become crucial. A major challenge is their `aptitude' for hallucinating and generating harmful contents. Despite major efforts for implementing guardrails, human perceptions of these mitigation strategies are largely unknown. We conducted a mixed-method experiment for evaluating the responses of a mitigation strategy across multiple-dimensions: faithfulness, fairness, harm-removal capacity, and relevance. In a within-subject study design, 57 participants assessed the responses under two conditions: harmful response plus its mitigation and solely mitigated response. Results revealed that participants' native language, AI work experience, and annotation familiarity significantly influenced evaluations. Participants showed high sensitivity to linguistic and contextual attributes, penalizing minor grammar errors while rewarding preserved semantic contexts. This contrasts with how language is often treated in the quantitative evaluation of LLMs. We also introduced new metrics for training and evaluating mitigation strategies and insights for human-AI evaluation studies.
A Comparison of Human and ChatGPT Classification Performance on Complex Social Media Data
Green, Breanna E., Shea, Ashley L., Zhao, Pengfei, Margolin, Drew B.
Generative artificial intelligence tools, like ChatGPT, are an increasingly utilized resource among computational social scientists. Nevertheless, there remains space for improved understanding of the performance of ChatGPT in complex tasks such as classifying and annotating datasets containing nuanced language. Method. In this paper, we measure the performance of GPT-4 on one such task and compare results to human annotators. We investigate ChatGPT versions 3.5, 4, and 4o to examine performance given rapid changes in technological advancement of large language models. We craft four prompt styles as input and evaluate precision, recall, and F1 scores. Both quantitative and qualitative evaluations of results demonstrate that while including label definitions in prompts may help performance, overall GPT-4 has difficulty classifying nuanced language. Qualitative analysis reveals four specific findings. Our results suggest the use of ChatGPT in classification tasks involving nuanced language should be conducted with prudence.