Goto

Collaborating Authors

 notebooklm


We asked experts about the most responsible ways to use AI tools – here's what they said

The Guardian

Three years on from the release of ChatGPT, two broad camps have formed: those people who refuse to use it, and those who use it every day. Three years on from the release of ChatGPT, two broad camps have formed: those people who refuse to use it, and those who use it every day. We asked experts about the most responsible ways to use AI tools - here's what they said Three years on from the release of ChatGPT, two broad camps have formed: those people who refuse to use it, and those who use it every day. A 2025 survey by the Pew Research Center found that one-third of US adults say they have been using ChatGPT. This includes 58% of US adults under 30 - roughly double the share two years ago.


AI-generated podcasts: Synthetic Intimacy and Cultural Translation in NotebookLM's Audio Overviews

Rettberg, Jill Walker

arXiv.org Artificial Intelligence

This paper analyses AI-generated podcasts produced by Google's NotebookLM, which generates audio podcasts with two chatty AI hosts discussing whichever documents a user uploads. While AI-generated podcasts have been discussed as tools, for instance in medical education, they have not yet been analysed as media. By uploading different types of text and analysing the generated outputs I show how the podcasts' structure is built around a fixed template. I also find that NotebookLM not only translates texts from other languages into a perky standardised Mid-Western American accent, it also translates cultural contexts to a white, educated, middle-class American default. This is a distinct development in how publics are shaped by media, marking a departure from the multiple public spheres that scholars have described in human podcasting from the early 2000s until today, where hosts spoke to specific communities and responded to listener comments, to an abstraction of the podcast genre.


Expert Evaluation of LLM World Models: A High-$T_c$ Superconductivity Case Study

Guo, Haoyu, Tikhanovskaya, Maria, Raccuglia, Paul, Vlaskin, Alexey, Co, Chris, Liebling, Daniel J., Ellsworth, Scott, Abraham, Matthew, Dorfman, Elizabeth, Armitage, N. P., Feng, Chunhan, Georges, Antoine, Gingras, Olivier, Kiese, Dominik, Kivelson, Steven A., Oganesyan, Vadim, Ramshaw, B. J., Sachdev, Subir, Senthil, T., Tranquada, J. M., Brenner, Michael P., Venugopalan, Subhashini, Kim, Eun-Ah

arXiv.org Artificial Intelligence

Large Language Models (LLMs) show great promise as a powerful tool for scientific literature exploration. However, their effectiveness in providing scientifically accurate and comprehensive answers to complex questions within specialized domains remains an active area of research. Using the field of high-temperature cuprates as an exemplar, we evaluate the ability of LLM systems to understand the literature at the level of an expert. We construct an expert-curated database of 1,726 scientific papers that covers the history of the field, and a set of 67 expert-formulated questions that probe deep understanding of the literature. We then evaluate six different LLM-based systems for answering these questions, including both commercially available closed models and a custom retrieval-augmented generation (RAG) system capable of retrieving images alongside text. Experts then evaluate the answers of these systems against a rubric that assesses balanced perspectives, factual comprehensiveness, succinctness, and evidentiary support. Among the six systems two using RAG on curated literature outperformed existing closed models across key metrics, particularly in providing comprehensive and well-supported answers. We discuss promising aspects of LLM performances as well as critical short-comings of all the models. The set of expert-formulated questions and the rubric will be valuable for assessing expert level performance of LLM based reasoning systems.


Not Wrong, But Untrue: LLM Overconfidence in Document-Based Queries

Hagar, Nick, Agustianto, Wilma, Diakopoulos, Nicholas

arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly used in newsroom workflows, but their tendency to hallucinate poses risks to core journalistic practices of sourcing, attribution, and accuracy. We evaluate three widely used tools - ChatGPT, Gemini, and NotebookLM - on a reporting-style task grounded in a 300-document corpus related to TikTok litigation and policy in the U.S. We vary prompt specificity and context size and annotate sentence-level outputs using a taxonomy to measure hallucination type and severity. Across our sample, 30% of model outputs contained at least one hallucination, with rates approximately three times higher for Gemini and ChatGPT (40%) than for NotebookLM (13%). Qualitatively, most errors did not involve invented entities or numbers; instead, we observed interpretive overconfidence - models added unsupported characterizations of sources and transformed attributed opinions into general statements. These patterns reveal a fundamental epistemological mismatch: While journalism requires explicit sourcing for every claim, LLMs generate authoritative-sounding text regardless of evidentiary support. We propose journalism-specific extensions to existing hallucination taxonomies and argue that effective newsroom tools need architectures that enforce accurate attribution rather than optimize for fluency.


A.I. Is Coming for Culture

The New Yorker

I often wake up before dawn, ahead of my wife and kids, so that I can enjoy a little solitary time. I creep downstairs to the silent kitchen, drink a glass of water, and put in my AirPods. Then I choose some music, set up the coffee maker, and sit and listen while the coffee brews. It's in this liminal state that my encounter with the algorithm begins. Groggily, I'll scroll through some dad content on Reddit, or watch photography videos on YouTube, or check Apple News.


Google simplifies its Gemini AI offerings with streamlined branding

PCWorld

Google is reshuffling its AI service brands, no longer to use the terms "Gemini Pro" or "Gemini Ultra" going forward, reports 9to5Google. Previously, the company had also stopped using the "Gemini Advanced" brand in connection with Google I/O 2025. Instead of giving the impression that there are several different versions of Gemini, Google will now simply call its AI assistant "Gemini." The AI assistant will now be available in Free, Pro, and Ultra tiers. Those who already use the Gemini AI assistant should already see the corresponding change in their relevant apps, but if not, the switch should arrive soon.


Google's best AI research tool is now on your phone

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. Amidst the flurry of AI announcements and product reveals from Google in recent months, you might have missed one of the most useful AI-powered apps in the whole collection: NotebookLM (that LM stands for Language Model). Perhaps NotebookLM has gone largely under the radar because it was originally launched as more of an academic research tool when it first appeared back in 2023. Its user interface lacks some of the slickness and accessibility of Google Gemini, and it's not quite as obvious how you're supposed to use it, or what it can do. However, NotebookLM is gradually becoming better known amongst consumers, with official apps for Android and iOS now available, alongside the web app.


Generative AI in clinical practice: novel qualitative evidence of risk and responsible use of Google's NotebookLM

Reuter, Max, Philippone, Maura, Benton, Bond, Dilley, Laura

arXiv.org Artificial Intelligence

Figure 1 presents examples of NotebookLM's shortcomings Importantly, using NotebookLM to educate medical professionals presently risks of misleading them, as NotebookLM's lack Inaccurate responses given by NotebookLM to user queries; output is stylized for visual clarity. NotebookLM advises the user to tell their patients that eating rocks is healthy, citing the user's document. Passages from Dihan et al. advocating for use of NotebookLM (Column 1) which are associated with clinical and/or ethical concerns "Though NotebookLM is a commercial entity that does not abide by patient privacy regulations, it does represent an " A podcast generator can improve the way Given any set of documents, and especially those containing complex documents, LLMs may misinterpret and subsequently misrepresent some of their contents. "Rather than requiring active visual engagement through reading, podcasts allow NotebookLM can neither identify misinformation contained within uploaded files nor incorporate relevant information beyond the uploaded content. "[NotebookLM's] citations are automatically generated for all content that NotebookLM pulls from within these materials, No funding was received for the publication of this article.


Enhancing Pancreatic Cancer Staging with Large Language Models: The Role of Retrieval-Augmented Generation

Johno, Hisashi, Johno, Yuki, Amakawa, Akitomo, Sato, Junichi, Tozuka, Ryota, Komaba, Atsushi, Watanabe, Hiroaki, Watanabe, Hiroki, Goto, Chihiro, Morisaka, Hiroyuki, Onishi, Hiroshi, Nakamoto, Kazunori

arXiv.org Artificial Intelligence

Purpose: Retrieval-augmented generation (RAG) is a technology to enhance the functionality and reliability of large language models (LLMs) by retrieving relevant information from reliable external knowledge (REK). RAG has gained interest in radiology, and we previously reported the utility of NotebookLM, an LLM with RAG (RAG-LLM), for lung cancer staging. However, since the comparator LLM differed from NotebookLM's internal model, it remained unclear whether its advantage stemmed from RAG or inherent model differences. To better isolate RAG's impact and assess its utility across different cancers, we compared NotebookLM with its internal LLM, Gemini 2.0 Flash, in a pancreatic cancer staging experiment. Materials and Methods: A summary of Japan's pancreatic cancer staging guidelines was used as REK. We compared three groups - REK+/RAG+ (NotebookLM with REK), REK+/RAG- (Gemini 2.0 Flash with REK), and REK-/RAG- (Gemini 2.0 Flash without REK) - in staging 100 fictional pancreatic cancer cases based on CT findings. Staging criteria included TNM classification, local invasion factors, and resectability classification. In REK+/RAG+, retrieval accuracy was quantified based on the sufficiency of retrieved REK excerpts. Results: REK+/RAG+ achieved a staging accuracy of 70%, outperforming REK+/RAG- (38%) and REK-/RAG- (35%). For TNM classification, REK+/RAG+ attained 80% accuracy, exceeding REK+/RAG- (55%) and REK-/RAG- (50%). Additionally, REK+/RAG+ explicitly presented retrieved REK excerpts, achieving a retrieval accuracy of 92%. Conclusion: NotebookLM, a RAG-LLM, outperformed its internal LLM, Gemini 2.0 Flash, in a pancreatic cancer staging experiment, suggesting that RAG may improve LLM's staging accuracy. Furthermore, its ability to retrieve and present REK excerpts provides transparency for physicians, highlighting its applicability for clinical diagnosis and classification.


Experiments with Large Language Models on Retrieval-Augmented Generation for Closed-Source Simulation Software

Baumann, Andreas, Eberhard, Peter

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly helpful in text generation, even writing code in programming languages based on user prompts written in natural language. They are even applied to generate simulation models for multibody systems from natural language. Research results suggest that LLMs surpass the mere replication of existing code examples, where some LLMs have been trained on an open-source multibody simulation code. However, for closed-source simulation software, such results are not to be expected as their ideas and concepts might differ from other publicly available ones. LLMs can hallucinate for knowledge-intensive tasks, such as model creation, which can lead to wrong responses. This is especially the case for the LLM unknown closed-source simulation software. The same applies to other internal knowledge kept private to protect intellectual property or data privacy. The Retrieval-Augmented Generation (RAG) approach might yield a solution for these knowledge-intensive tasks. This paper explores the application of RAG to closed-source simulation software and presents first experiments. After a brief introduction to LLMs, the RAG approach, and the simulation method applied by the close-source simulation software, several examples are provided to test LLMs' knowledge of the simulation software and the creation of simulation models using two RAG systems. The examples show promising results indicating the benefits of applying RAG systems to closed-source simulation software, helping to access their knowledge. Nevertheless, they also reveal gaps in the applied information and open questions for further research.