Large Language Model
Unsupervised Text Segmentation via Kernel Change-Point Detection on Sentence Embeddings
Jia, Mumin, Diaz-Rodriguez, Jairo
Unsupervised text segmentation is crucial because boundary labels are expensive, subjective, and often fail to transfer across domains and granularity choices. We propose Embed-KCPD, a training-free method that represents sentences as embedding vectors and estimates boundaries by minimizing a penalized KCPD objective. Beyond the algorithmic instantiation, we develop, to our knowledge, the first dependence-aware theory for KCPD under $m$-dependent sequences, a finite-memory abstraction of short-range dependence common in language. We prove an oracle inequality for the population penalized risk and a localization guarantee showing that each true change point is recovered within a window that is small relative to segment length. To connect theory to practice, we introduce an LLM-based simulation framework that generates synthetic documents with controlled finite-memory dependence and known boundaries, validating the predicted scaling behavior. Across standard segmentation benchmarks, Embed-KCPD often outperforms strong unsupervised baselines. A case study on Taylor Swift's tweets illustrates that Embed-KCPD combines strong theoretical guarantees, simulated reliability, and practical effectiveness for text segmentation.
A Universal Load Balancing Principle and Its Application to Large Language Model Serving
Chen, Zixi, Bu, Tianci, Song, Chendong, Lu, Xin, Ye, Yinyu, Zhou, Zijie
Load balancing-the allocation of work across parallel resources to reduce delay, energy and cost-is a pervasive challenge in science and engineering, from large-scale simulation and data processing to cloud and manufacturing operations. Motivated by the emerging bottleneck in large language model (LLM) serving, we study a particularly stringent regime of load balancing that arises in barrier-synchronized, stateful systems: work cannot be freely migrated and progress is gated by the slowest participant at each step, so heterogeneity and temporal drift in workloads create persistent stragglers and substantial idle time. LLM serving under data-parallel decoding provides a prominent modern instance: in production traces, barrier-induced idle can exceed 40% of compute time per decode step. Here we develop a universal load-balancing principle, which admits a step-wise finite-horizon integer-optimization formulation and yields worst-case guarantees: across LLM decode models and a broader class of non-decreasing workload drift processes, it reduces long-run imbalance by a factor that grows with batch size and system scale. Extensive experiments corroborate the theory, showing substantial improvements in throughput and latency together with reductions in energy consumption. These results provide a general, theoretically grounded framework for load balancing, with immediate implications for sustainable LLM serving and broad relevance to other synchronization-gated resource-allocation problems.
"Rebuilding" Statistics in the Age of AI: A Town Hall Discussion on Culture, Infrastructure, and Training
Donoho, David L., Kang, Jian, Lin, Xihong, Mukherjee, Bhramar, Nettleton, Dan, Nugent, Rebecca, Rodriguez, Abel, Xing, Eric P., Zheng, Tian, Zhu, Hongtu
This article presents the full, original record of the 2024 Joint Statistical Meetings (JSM) town hall, "Statistics in the Age of AI," which convened leading statisticians to discuss how the field is evolving in response to advances in artificial intelligence, foundation models, large-scale empirical modeling, and data-intensive infrastructures. The town hall was structured around open panel discussion and extensive audience Q&A, with the aim of eliciting candid, experience-driven perspectives rather than formal presentations or prepared statements. This document preserves the extended exchanges among panelists and audience members, with minimal editorial intervention, and organizes the conversation around five recurring questions concerning disciplinary culture and practices, data curation and "data work," engagement with modern empirical modeling, training for large-scale AI applications, and partnerships with key AI stakeholders. By providing an archival record of this discussion, the preprint aims to support transparency, community reflection, and ongoing dialogue about the evolving role of statistics in the data- and AI-centric future.
Artificial Entanglement in the Fine-Tuning of Large Language Models
Chen, Min, Wang, Zihan, Chen, Canyu, Wu, Zeguan, Li, Manling, Liu, Junyu
Large language models (LLMs) can be adapted to new tasks using parameter-efficient fine-tuning (PEFT) methods that modify only a small number of trainable parameters, often through low-rank updates. In this work, we adopt a quantum-information-inspired perspective to understand their effectiveness. From this perspective, low-rank parameterizations naturally correspond to low-dimensional Matrix Product States (MPS) representations, which enable entanglement-based characterizations of parameter structure. Thereby, we term and measure "Artificial Entanglement", defined as the entanglement entropy of the parameters in artificial neural networks (in particular the LLMs). We first study the representative low-rank adaptation (LoRA) PEFT method, alongside full fine-tuning (FFT), using LLaMA models at the 1B and 8B scales trained on the Tulu3 and OpenThoughts3 datasets, and uncover: (i) Internal artificial entanglement in the updates of query and value projection matrices in LoRA follows a volume law with a central suppression (termed as the "Entanglement Valley"), which is sensitive to hyper-parameters and is distinct from that in FFT; (ii) External artificial entanglement in attention matrices, corresponding to token-token correlations in representation space, follows an area law with logarithmic corrections and remains robust to LoRA hyper-parameters and training steps. Drawing a parallel to the No-Hair Theorem in black hole physics, we propose that although LoRA and FFT induce distinct internal entanglement signatures, such differences do not manifest in the attention outputs, suggesting a "no-hair" property that results in the effectiveness of low rank updates. We further provide theoretical support based on random matrix theory, and extend our analysis to an MPS Adaptation PEFT method, which exhibits qualitatively similar behaviors.
Inside OpenAI's big play for science
An exclusive conversation with Kevin Weil, head of OpenAI for Science, a new in-house team that wants to make scientists more productive. In the three years since ChatGPT's explosive debut, OpenAI's technology has upended a remarkable range of everyday activities at home, at work, in schools--anywhere people have a browser open or a phone out, which is everywhere. Now OpenAI is making an explicit play for scientists. In October, the firm announced that it had launched a whole new team, called OpenAI for Science, dedicated to exploring how its large language models could help scientists and tweaking its tools to support them. The last couple of months have seen a slew of social media posts and academic publications in which mathematicians, physicists, biologists, and others have described how LLMs (and OpenAI's GPT-5 in particular) have helped them make a discovery or nudged them toward a solution they might otherwise have missed. In part, OpenAI for Science was set up to engage with this community.
How to generate AI images using ChatGPT
Apple could unveil Gemini-powered Siri in Feb. A good prompt goes a long way. ChatGPT is available on both iOS and Android. Since March 2025, ChatGPT has been capable of generating images. Following a period where it briefly wasn't available to free users, you now don't even pay for one of OpenAI's subscriptions to use this feature.
Trump admin reportedly plans to use AI to write federal regulations
Apple could unveil Gemini-powered Siri in Feb. The DOT's top lawyer said they'don't need the perfect rule' and that they just'want good enough.' The Trump administration is planning on using Google Gemini to draft important federal regulations, . This is starting with the Department of Transportation, according to interviews with agency staffers. Regulations created by the DOT help keep us safe when traveling.
ChatGPT is now indexing Grok's AI slop
PCWorld reports that ChatGPT 5.2 is now indexing Grokipedia, xAI's AI-generated encyclopedia known for inaccuracies and conspiracy theories. This creates a concerning feedback loop where AI-generated misinformation spreads between major language models, potentially overwriting established knowledge. The integration poses significant risks to information integrity as biased or false content from one AI system influences another's responses. More and more of the web is filling up with LLM-generated text, images, and even videos and music . It's an even bigger problem than it seems because the "AI" systems that have scoured the web to generate their large language models are now .
A.I. Was Supposed to "Revolutionize" Work. In Many Offices, It's Only Creating Chaos.
Work A.I. Was Supposed to "Revolutionize" Work. Although we've been told that A.I. is poised to "revolutionize" work, at the moment it seems to be doing something else entirely: spreading chaos. All throughout American offices, A.I. platforms like ChatGPT are delivering answers that sound right even when they aren't, transcription tools that turn meetings into works of fiction, and documents that look polished on the surface but are riddled with factual errors and missing nuance. If you've read anything about A.I., you know that it sometimes "hallucinates" facts that simply aren't true, yet asserts them with so much confidence that its lies don't get caught. Clearly, there's more work to do on this emerging technology, but in the meantime, it's ravaging some workplaces.
The Download: why LLMs are like aliens, and the future of head transplants
How large is a large language model? We now coexist with machines so vast and so complicated that nobody quite understands what they are, how they work, or what they can really do--not even the people who build them. Even though nobody fully understands how it works--and thus exactly what its limitations might be--hundreds of millions of people now use this technology every day. To help overcome our ignorance, researchers are studying LLMs as if they were doing biology or neuroscience on vast living creatures--city-size xenomorphs that have appeared in our midst. And they're discovering that large language models are even weirder than they thought. The Italian neurosurgeon Sergio Canavero has been preparing for a surgery that might never happen.