Large Language Model
The ChatGPT desktop app for Mac just got hit with a security breach
OpenAI's ChatGPT app for Mac just experienced a security breach involving two employee devices, according to a report by . The company is issuing a software update to users that's rolling out now, but won't arrive for everyone until June 12. The why of it all is a bit convoluted, stemming from a security issue involving open-source code. A widely-used open-source library was compromised and two devices at the company were impacted. Upon identification of the malicious activity, we worked quickly to investigate, contain and take steps to protect our systems, the company wrote in a blog post.
Reflections from #AIES2025
In this piece, we reflect on AIES 2025, and outline the conversations and presentations from a discussion session on LLMs in the context of clinical usage and human rights. This is a crosspost from the latest issue of AI Matters, published by the ACM SIAGI. This year's conference on artificial intelligence, ethics and society (AIES) took place in the north of Madrid within the 180m-high tower block that forms the vertical campus of IE University. The event kicked off with a welcome from the chairs and organising committee members, with this opening session also featuring the conference best paper awards. Topics covered during the three-day event included mitigating bias, integrating AI into the workplace, evaluating LLMs in clinical settings, power dynamics in AI ecosystems, and dataset creation.
The '80/20′ ChatGPT prompt is the fastest way to learn anything
When you purchase through links in our articles, we may earn a small commission. The '80/20 ChatGPT prompt is the fastest way to learn anything Use the 80/20 prompt technique to get up to speed in minutes. We've all been there--that moment when you realize you're in way over your head. For me, it happened during my first briefing with a smart light vendor, when it became painfully obvious that I couldn't tell a standard A19 bulb from a BR30 can light. Clearly I needed help, fast.
The Elon Musk v Sam Altman battle is a distraction Karen Hao
'If OpenAI lost its footing as the AI industry frontrunner, another barely distinguishable competitor - Musk's xAI or other - would simply replace it.' 'If OpenAI lost its footing as the AI industry frontrunner, another barely distinguishable competitor - Musk's xAI or other - would simply replace it.' If it wasn't already clear, Elon Musk and Sam Altman hate each other. While the two men were once cofounders of OpenAI, they're now locked in a vicious feud, playing out in all its theatrics in front of a judge and jury in a California courtroom. Musk is suing, alleging that Altman and OpenAI president Greg Brockman tricked him into forming and funding the organization as a non-profit before they subsequently restructured it to have a for-profit entity.
OpenAI floats idea of global AI governance body with U.S. and China
OpenAI floats idea of global AI governance body with U.S. and China The U.S. has an opportunity to use its lead in artificial intelligence technology to create a global governance mechanism to ensure safer, more resilient systems, OpenAI's vice president of global affairs, Chris Lehane, said. OpenAI would support the creation of a global governance body for artificial intelligence led by the U.S. and including China as a member, a top company executive said, hours before the start of U.S. President Donald Trump's high-stakes meeting with Chinese President Xi Jinping. When asked about the China summit, OpenAI's vice president of global affairs, Chris Lehane, said Wednesday that the U.S. has an opportunity to use its lead in AI technology to create a global governance mechanism resulting in safer, more resilient systems. "AI, in some level, transcends a lot of the prevailing or traditional trade type of issues," Lehane told reporters during a briefing at the company's offices in Washington. "There is an opportunity to really start to build something up globally, and have countries around the world, including China, potentially participate." In a time of both misinformation and too much information, quality journalism is more crucial than ever.
Microsoft is retiring Copilot Mode on Edge, because everything is Copilot Mode now
Microsoft is retiring Copilot Mode on Edge, because its features are now built directly into the browser for both desktop and mobile. If you'll recall, Microsoft started testing Copilot Mode on Edge in July last year, allowing you to use it to search for information across multiple open browser tabs and to analyze the details on each page. Now, the feature is available not just on desktop, but also on Edge for mobile. Just ask Copilot a question or give it a command, such as Compare the smart TVs across all my open tabs, and it will pull info from your tabs to give you a structured, side-by-side comparison analysis. After the initial testing of Copilot Mode, Microsoft rolled out Journeys, which you can use to save projects you can revisit in the future. It's now also available for free on mobile, so you can pick up planning trips or making purchases from where you left off days or weeks ago.
Uncovering Symmetry Transfer in Large Language Models via Layer-Peeled Optimization
Du, Zhehang, He, Hangfeng, Su, Weijie
Large language models (LLMs) are pretrained by minimizing the cross-entropy loss for next-token prediction. In this paper, we study whether this optimization strategy can induce geometric structure in the learned model weights and context embeddings. We approach this problem by analyzing a constrained layer-peeled optimization program, which serves as a mathematically tractable surrogate for LLMs by treating the output projection matrix and last-layer context embeddings as optimization variables. Our analysis of this nonconvex optimization program demonstrates that symmetries in the target next-token distributions are transferred to the global minimizers of the layer-peeled model in a precise group-theoretic sense. Specifically, we prove that when the target tokens exhibit a cyclic-shift symmetry (such as the seven days of the week or the twelve months of the year), the optimal logit matrix is exactly circulant, and the Gram matrices of both the output projections and the context embeddings form circulant geometries as well. Next, for exchangeable target distributions invariant under the symmetric group and, more generally, under two-transitive group actions, we show that the global optimal output projection matrix forms a simplex equiangular tight frame, while the optimal logit matrix and context embeddings inherit the permutation symmetries present in the input data. A key technical step is to reduce the constrained nonconvex factorized problem to an explicit logit-level convex characterization for cyclic symmetry and to a symmetry-based lower bound for permutation symmetry, together with a sharp characterization of the optimal factorization. Finally, we empirically demonstrate that open-source LLMs naturally exhibit symmetries consistent with our theoretical predictions, despite being trained without any explicit regularization promoting such geometric structure.
When Should an AI Workflow Release? Always-Valid Inference for Black-Box Generate-Verify Systems
Cho, Young Hyun, Sun, Will Wei
LLM-enabled AI workflows increasingly produce outputs through iterative generate-evaluate-revise loops. Each iteration can improve the candidate, but it also creates a release decision: when to stop and output the current result? This raises a statistical challenge because deployment-time evaluator scores are adaptively generated and repeatedly monitored, yet the likelihood models or exchangeability assumptions typically used for calibration are unavailable. We propose an always-valid release wrapper for existing generator-evaluator pipelines. The wrapper builds a hard-negative reference pool of high-scoring failures, calibrates deployment-time evaluator scores against this pool, and accumulates the resulting evidence with an e-process. This separates two roles: the reference pool turns black-box scores into conservative evidence, while the e-process provides validity under optional stopping. In theory, we show that a conservative reference pool yields finite-sample control of the probability of releasing on infeasible tasks, that is, tasks for which the given workflow is not capable of producing a reliable solution. We also characterize conditions under which the same conservative rule still achieves nontrivial release on feasible tasks. In an MBPP+ coding-agent case study, the wrapper reduces premature incorrect release relative to baseline stopping rules while still releasing on tasks for which the workflow repeatedly accumulates moderate supporting evidence.
LLMs as Implicit Imputers: Uncertainty Should Scale with Missing Information
Large language models (LLMs) are increasingly deployed in settings where the available context is incomplete or degraded. We argue that an LLM generating answers under incomplete context can be viewed as an implicit imputer, and evaluated against a criterion from the multiple imputation (MI) literature: uncertainty should scale with the amount of missing information. We assess this criterion on SQuAD, using a controlled framework in which context availability is varied across five levels. We evaluate two answer-level uncertainty measures that can be estimated from repeated sampling: sampling-based confidence (empirical mode frequency) and response entropy. Confidence fails to reflect increasing missingness: it remains high even as accuracy collapses. Entropy, by contrast, increases with context removal, consistent with the MI analogy, and explains substantially more variance in accuracy than confidence across all evidence levels (quadratic $R^2$ gap up to 0.057). We further introduce a black-box diagnostic $ρ_R(α)$ that estimates the proportion of baseline uncertainty resolved by context level $α$, requiring only repeated sampling with and without context. These results suggest that entropy is a more responsive black-box uncertainty measure than confidence under incomplete context.
Learning Perturbations to Extrapolate Your LLM
Cen, Zetai, Gu, Chenfei, Zhu, Jin, Li, Ting, Chen, Yunxiao, Shi, Chengchun
Training large language models (LLMs) such as GPT-5 and Qwen-3 (Singh et al., 2025; Yang et al., 2025) on massive text corpora aims at capturing the underlying distribution of natural language. Yet, it remains challenging for the trained model to extrapolate to out-of-distribution or out-of-domain settings beyond the support of its training data. The literature has seen the development of various data perturbation techniques, such as synonym replacement, random insertion, deletion, and swap, that modify training instances into semantically similar variants to effectively expose LLMs to a broader range of inputs and improve their ability to generalize beyond the training data (Feng et al., 2019, 2020; Li et al., 2024; Cen et al., 2026). However, their approach remains grounded in the discrete, word-level augmentation procedures mentioned previously, which may restrict its adaptivity across diverse domains. While discrete perturbations are simple to use, they could be too coarse and hard to refine due to the complexity of natural language (Park et al., 2022; Li et al., 2023). Meanwhile, fixed perturbations apply the same transformations to the data regardless of the contexts, thus failing to generalize appropriately (Ismailov and Asanova, 2025).