Goto

Collaborating Authors

 Large Language Model


Sam Altman says Elon Musk wanted 90 percent of OpenAI in high-stakes trial

Al Jazeera

In a United States court, OpenAI chief executive Sam Altman has rejected claims from fellow tech mogul Elon Musk that he betrayed the artificial intelligence company's original vision. Tuesday marked the start of Altman's testimony in a contentious trial unfolding in Oakland, California, between some of tech's richest and most powerful titans. He alleged that OpenAI's leader persuaded him to invest $38bn, based on a goal of improving humanity, only to see the company pivot to a for-profit venture in 2019. On the witness stand on Tuesday, Altman instead framed Musk as a competitor obsessed with exercising control over OpenAI. "It does not fit with my conception of the words'stealing a charity' to look at what has actually happened here," Altman told the court.


Sam Altman defends OpenAI in courtroom showdown with Elon Musk

The Guardian

Sam Altman is questioned by OpenAI's attorney, Bill Savitt, before Yvonne Gonzalez Rogers, a US district judge, at a federal courthouse in Oakland, California, on 12 May 2026 in a courtroom sketch. Sam Altman is questioned by OpenAI's attorney, Bill Savitt, before Yvonne Gonzalez Rogers, a US district judge, at a federal courthouse in Oakland, California, on 12 May 2026 in a courtroom sketch. The OpenAI CEO, Sam Altman, took the stand on Tuesday to defend himself and his company against a lawsuit by Elon Musk . Altman is set to be one of the final witnesses in the trial, which has pitted two of the tech industry's most powerful men against each other in a dramatic courtroom showdown. Musk has accused Altman and OpenAI of breaking the AI firm's founding agreement by restructuring it into a for-profit enterprise, alleging that Altman essentially swindled him into co-founding the company and providing tens of millions in financial backing.


Elon Musk said control of OpenAI should go to his children, Sam Altman tells jury

BBC News

Elon Musk tried to take control of OpenAI, even suggesting it could pass to his children when he dies, Sam Altman said on Tuesday. Altman is co-founder and chief executive of the artificial intelligence (AI) company behind ChatGPT. He is being sued by Musk, who accuses him of having looted a charity given OpenAI began as a non-profit. Appearing before a federal jury in Oakland, California, Altman said Musk not only backed the idea of OpenAI becoming a for-profit business, he wanted control of it for the long-run. A particularly hair-raising moment was when my cofounders asked, 'If you have control, what happens when you die?'


AI voice chat sucks. This startup thinks it's cracked it

PCWorld

PCWorld reports that Thinking Machines, founded by ex-OpenAI executive Mira Murati, has developed new AI voice interaction models that enable real-time conversations with interruptions and visual cue recognition. The technology uses a dual-AI system with a fast interaction model and background model for complex tasks, employing a multi-stream, micro-turn approach. This advancement could transform AI voice chat from current CB radio-style turn-taking into natural human-like conversations, though the technology remains in research phase. Voice chatting with today's AI can feel as stilted as an old-school CB radio exchange, where you're forced to take turns as you talk. "Hey ChatGPT, let's talk about the movies!


ChatGPT is 20/month, but one AI platform gives you GPT, Claude, and Gemini for a year for 30

PCWorld

When you purchase through links in our articles, we may earn a small commission. You can get access to ChatGPT, Claude, and Gemini through ChatOn AI Assistant for just $30. Juggling AI subscriptions can get expensive fast. A single AI subscription can cost hundreds per year, and using multiple tools only drives the price higher. That's part of why ChatOn AI Assistant has been gaining attention recently.


Daybreak is OpenAI's response to Anthropic's Claude Mythos

Engadget

OpenAI has just launched Daybreak, a cybersecurity initiative that's clearly the company's competitor to Anthropic's Project Glasswing . If you'll recall, Glasswing uses Anthropic's unreleased AI model, Claude Mythos Preview, to provide its clients' cyber defense needs. It's been promising, so far: Mozilla revealed in April that Mythos helped it find and patch 271 vulnerabilities in the latest release of the Firefox browser. OpenAI says Daybreak uses its various AI models, including its specialized security agent Codex. In its announcement, the company explained that Daybreak is built around the premise that cyber defense should be built into software from the start and not just revolve around finding and fixing vulnerabilities.


Generative Synthetic Data for Causal Inference: Pitfalls, Remedies, and Opportunities

arXiv.org Machine Learning

Synthetic tabular data are often evaluated by distributional similarity, privacy distance, or train-on-synthetic-test-on-real predictive performance, but these criteria do not ensure validity for causal inference. We show that fully generative tabular synthesizers, including GAN- and LLM-based models, can preserve predictive utility while distorting average treatment effect (ATE) estimates. The failure is structural: ATE preservation requires both a realistic covariate law and an accurate treatment-effect contrast, whereas prediction loss penalizes treatment-effect error only through an overlap-weighted term. We formalize this mismatch through sensitivity and loss-decomposition results, and identify an analogous decomposition in block-level next-token prediction under log loss. Motivated by the tabular causal analysis, we propose a hybrid synthetic-data framework that generates covariates while modeling treatment and outcome mechanisms separately, allowing causal-purpose treatment assignment such as randomized synthetic assignment. We evaluate this framework in three settings: ATE preservation under fully generative versus hybrid synthesis, targeted augmentation for practical positivity problems, and synthetic simulation engines for comparing OR, IPW, AIPW, and TMLE before real-data analysis. Across synthetic and ACTG experiments, hybrid synthesis improves causal fidelity relative to fully generative baselines; LLM-based hybrid synthesis is often more faithful than CTGAN for ATE preservation and finite-sample estimator benchmarking.


Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control

arXiv.org Machine Learning

Reinforcement learning (RL) has enabled complex reasoning abilities in large language models (LLMs). However, most RL algorithms suffer from performance saturation, preventing continued gains as RL training scales. This problem can be characterized by the collapse of entropy, a key diagnostic for exploration in RL. Existing attempts focus on preventing entropy collapse through regularization or clipping. However, their resulting entropy curves often exhibit instability in the long term, which hinders performance gains. In this paper, we introduce Entrocraft, a simple rejection-sampling approach that realizes user-customized entropy schedule by biasing the advantage distributions. Entrocraft requires no objective regularization and is advantage-estimator-agnostic. Theoretically, we relate per-step entropy change to the advantage distribution under minimal assumptions. This explains the behavior of existing RL and entropy-preserving methods. Entrocraft also enables a systematic study of entropy schedules, which reveals that linear annealing, which starts high and decays to a slightly lower target, performs best. Empirically, Entrocraft addresses performance saturation, significantly improving generalization, output diversity, and long-term training. It enables a 4B model to outperform an 8B baseline, sustains improvement for up to 4x longer before plateauing, and raises pass@K by 50% over the baseline.


Asymptotically Log-Optimal Bayes-Assisted Confidence Sequences for Bounded Means

arXiv.org Machine Learning

Confidence sequences based on test martingales provide time-uniform uncertainty quantification for the mean of bounded IID observations without parametric distributional assumptions. Their practical efficiency, however, depends strongly on the choice of martingale updates, and many existing constructions do not exploit prior information about plausible data-generating distributions or mean values. We propose a Bayes-assisted framework that uses a Bayesian working predictive model to adaptively construct confidence sequences. For each candidate mean and time point, the predictive distribution selects, among valid one-step martingale factors, the update maximising predictive expected log-growth; validity is therefore preserved even when the prior or working model is misspecified. We prove that if the predictive distribution is Wasserstein-consistent, the resulting procedure is asymptotically log-optimal, matching the per-sample log-growth of an oracle procedure with access to the true distribution. We instantiate the framework using robust predictives based on Dirichlet-process mixtures and Bayesian exponentially tilted empirical likelihood. Experiments on synthetic data, sequential best-arm identification for LLM evaluation, and prediction-powered inference show that informative priors can substantially reduce confidence-sequence width and sampling effort while retaining anytime-valid coverage.


A Semantic-Sampling Framework for Evaluating Calibration in Open-Ended Question Answering

arXiv.org Machine Learning

Calibration measures whether a model's predicted confidence aligns with its empirical accuracy, and is central to the reliable deployment of large language models (LLMs) in high-stakes domains such as medicine and law. While much recent work focuses on improving LLM calibration, the equally important question of how to evaluate it in realistic settings remains underdeveloped. Open-ended question answering (QA), the most common deployment setting for modern LLMs, is where existing evaluation methods fall short: logit-based metrics need restricted output formats and internal probabilities; verbalized confidence is self-reported and often overconfident; and sampling-based methods rely on task-specific extraction rules without a clear finite-sample target. We introduce Sem-ECE (Semantic-Sampling Expected Calibration Error), a calibration evaluation framework for open-ended QA that samples answers from the model, groups them into semantic classes, and uses the resulting frequencies as confidence. We study two estimators within this framework: Sem$_1$-ECE, the same-sample self-consistency score, and Sem$_2$-ECE, a held-out variant that separates answer selection from confidence evaluation. We prove both are asymptotically unbiased, and further show that they agree on easy questions but diverge on hard ones with Sem$_2$ achieving strictly smaller calibration error, so their gap also serves as a diagnostic for question difficulty. Experiments on three open-ended QA benchmarks across five leading commercial LLMs match our theoretical predictions and show that Sem-ECE outperforms verbalized confidence and existing sampling-based methods, while complementing logit-based evaluation when internal probabilities are unavailable.