Personal
Giving AI a voice: how does AI think it should be treated?
Fay, Maria, Flöther, Frederik F.
With the astounding progress in (generative) artificial intelligence (AI), there has been significant public discourse regarding regulation and ethics of the technology. Is it sufficient when humans discuss this with other humans? Or, given that AI is increasingly becoming a viable source of inspiration for people (and let alone the hypothetical possibility that the technology may at some point become "artificial general intelligence" and/or develop consciousness), should AI not join the discourse? There are new questions and angles that AI brings to the table that we might not have considered before - so let us make the key subject of this book an active participant. This chapter therefore includes a brief human-AI conversation on the topic of AI rights and ethics.
RAG Without the Lag: Interactive Debugging for Retrieval-Augmented Generation Pipelines
Lauro, Quentin Romero, Shankar, Shreya, Zeighami, Sepanta, Parameswaran, Aditya
Retrieval-augmented generation (RAG) pipelines have become the de-facto approach for building AI assistants with access to external, domain-specific knowledge. Given a user query, RAG pipelines typically first retrieve (R) relevant information from external sources, before invoking a Large Language Model (LLM), augmented (A) with this information, to generate (G) responses. Modern RAG pipelines frequently chain multiple retrieval and generation components, in any order. However, developing effective RAG pipelines is challenging because retrieval and generation components are intertwined, making it hard to identify which component(s) cause errors in the eventual output. The parameters with the greatest impact on output quality often require hours of pre-processing after each change, creating prohibitively slow feedback cycles. To address these challenges, we present RAGGY, a developer tool that combines a Python library of composable RAG primitives with an interactive interface for real-time debugging. We contribute the design and implementation of RAGGY, insights into expert debugging patterns through a qualitative study with 12 engineers, and design implications for future RAG tools that better align with developers' natural workflows.
Mitigating LLM Hallucinations with Knowledge Graphs: A Case Study
Li, Harry, Appleby, Gabriel, Alperin, Kenneth, Gomez, Steven R, Suh, Ashley
High-stakes domains like cyber operations need responsible and trustworthy AI methods. While large language models (LLMs) are becoming increasingly popular in these domains, they still suffer from hallucinations. This research paper provides learning outcomes from a case study with LinkQ, an open-source natural language interface that was developed to combat hallucinations by forcing an LLM to query a knowledge graph (KG) for ground-truth data during question-answering (QA). We conduct a quantitative evaluation of LinkQ using a well-known KGQA dataset, showing that the system outperforms GPT-4 but still struggles with certain question categories - suggesting that alternative query construction strategies will need to be investigated in future LLM querying systems. We discuss a qualitative study of LinkQ with two domain experts using a real-world cybersecurity KG, outlining these experts' feedback, suggestions, perceived limitations, and future opportunities for systems like LinkQ.
The philosopher's machine: my conversation with Peter Singer's AI chatbot
I'm Peter Singer AI," the avatar says. I am almost expecting it to continue, like a reincarnated Clippy: "It looks like you're trying to solve a problem. The problem I am trying to solve is why Peter Singer, the man who has been called the world's most influential living philosopher, has created a chatbot. And also, whether it is any good. Me: Why do you exist?
NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes
Xu, Tianyang, Zheng, Haojie, Li, Chengze, Chen, Haoxiang, Liu, Yixin, Chen, Ruoxi, Sun, Lichao
Retrieval-augmented generation (RAG) empowers large language models to access external and private corpus, enabling factually consistent responses in specific domains. By exploiting the inherent structure of the corpus, graph-based RAG methods further enrich this process by building a knowledge graph index and leveraging the structural nature of graphs. However, current graph-based RAG approaches seldom prioritize the design of graph structures. Inadequately designed graph not only impede the seamless integration of diverse graph algorithms but also result in workflow inconsistencies and degraded performance. To further unleash the potential of graph for RAG, we propose NodeRAG, a graph-centric framework introducing heterogeneous graph structures that enable the seamless and holistic integration of graph-based methodologies into the RAG workflow. By aligning closely with the capabilities of LLMs, this framework ensures a fully cohesive and efficient end-to-end process. Through extensive experiments, we demonstrate that NodeRAG exhibits performance advantages over previous methods, including GraphRAG and LightRAG, not only in indexing time, query time, and storage efficiency but also in delivering superior question-answering performance on multi-hop benchmarks and open-ended head-to-head evaluations with minimal retrieval tokens. Our GitHub repository could be seen at https://github.com/Terry-Xu-666/NodeRAG.
Grace Wahba awarded the 2025 International Prize in Statistics
The International Prize in Statistics Foundation has awarded Grace Wahba the 2025 prize for "her groundbreaking work on smoothing splines, which has transformed data analysis and machine learning". Professor Wahba was among the earliest to pioneer the use of nonparametric regression modeling. Recent advances in computing and availability of large data sets have further popularized these models, especially under the guise of machine learning algorithms such as gradient boosting and neural networks. Nevertheless, the use of smoothing splines remains a mainstay of nonparametric regression. In seminal research that began in the early 1970s, Wahba developed theoretical foundations and computational algorithms for fitting smoothing splines to noisy data.
The Robotability Score: Enabling Harmonious Robot Navigation on Urban Streets
Franchi, Matt, Parreira, Maria Teresa, Bu, Fanjun, Ju, Wendy
This paper introduces the Robotability Score ($R$), a novel metric that quantifies the suitability of urban environments for autonomous robot navigation. Through expert interviews and surveys, we identify and weigh key features contributing to R for wheeled robots on urban streets. Our findings reveal that pedestrian density, crowd dynamics and pedestrian flow are the most critical factors, collectively accounting for 28% of the total score. Computing robotability across New York City yields significant variation; the area of highest R is 3.0 times more "robotable" than the area of lowest R. Deployments of a physical robot on high and low robotability areas show the adequacy of the score in anticipating the ease of robot navigation. This new framework for evaluating urban landscapes aims to reduce uncertainty in robot deployment while respecting established mobility patterns and urban planning principles, contributing to the discourse on harmonious human-robot environments.
"All Roads Lead to ChatGPT": How Generative AI is Eroding Social Interactions and Student Learning Communities
Hou, Irene, Man, Owen, Hamilton, Kate, Muthusekaran, Srishty, Johnykutty, Jeffin, Zadeh, Leili, MacNeil, Stephen
The widespread adoption of generative AI is already impacti ng learning and help-seeking. While the benefits of generative AI are well-understood, recent studies have also raised concernsabout increased potential for cheating and negative impacts on stud ents' metacognition and critical thinking. However, the potenti al impacts on social interactions, peer learning, and classroom dynamics are not yet well understood. To investigate these aspect s, we conducted 17 semi-structured interviews with undergraduate computing students across seven R1 universities in NorthAmerica. Our findings suggest that help-seeking requests are now often me di-ated by generative AI. For example, students often redirected questions from their peers to generative AI instead of providing assistance themselves, undermining peer interaction. Students also reported feeling increasingly isolated and demotivated as th e social support systems they rely on begin to break down. These findings are concerning given the important role that social interac tions play in students' learning and sense of belonging.
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Warstadt, Alex, Mueller, Aaron, Choshen, Leshem, Wilcox, Ethan, Zhuang, Chengxu, Ciro, Juan, Mosquera, Rafael, Paranjape, Bhargavi, Williams, Adina, Linzen, Tal, Cotterell, Ryan
Children can acquire language from less than 100 million words of input. Large language models are far less data-efficient: they typically require 3 or 4 orders of magnitude more data and still do not perform as well as humans on many evaluations. These intensive resource demands limit the ability of researchers to train new models and use existing models as developmentally plausible cognitive models. The BabyLM Challenge is a communal effort in which participants compete to optimize language model training on a fixed data budget. Submissions are compared on various evaluation tasks targeting grammatical ability, downstream task performance, and generalization. Participants can submit to up to three tracks with progressively looser data restrictions. From over 30 submissions, we extract concrete recommendations on how best to train data-efficient language models, and on where future efforts should (and perhaps should not) focus. The winning submissions using the LTG-BERT architecture (Samuel et al., 2023) outperformed models trained on trillions of words. Other submissions achieved strong results through training on shorter input sequences or training a student model on a pretrained teacher. Curriculum learning attempts, which accounted for a large number of submissions, were largely unsuccessful, though some showed modest improvements.
How em The Last of Us /em Fans Turned Against Its Breakout Star
By pretty much every objective measure, HBO's adaptation of the hit postapocalyptic video game The Last of Us has been a roaring success. Never before has a video game narrative been molded into Emmy nominations and such warm reception among respectable critics, industry darlings, and people who have no idea what the term "one-shotting" means. You'd think that the devotees who first fell in love with the game back when it was originally released in 2013 would be toasting the cultural ascendance of their favorite medium--and especially how the story's complicated morality has impacted those who've never picked up a controller. And yet, for as long as the show has been on television, its most dogmatic fans have been caught up in a controversy of much inferior consequence: Specifically, they're furious that Bella Ramsey doesn't look much like Ellie. On the most basic level, this observation is correct.