Goto

Collaborating Authors

 Large Language Model






SoftBank swings to profit on valuation boost from OpenAI bet

The Japan Times

SoftBank CEO Masayoshi Son (left) and OpenAI CEO Sam Altman attend an event in Tokyo in February 2025. SoftBank's investment gain on OpenAI stood at an estimated $19.8 billion as of December. SoftBank Group sprang back to a quarterly profit after investment gains from OpenAI neared $20 billion, a promising start for one of CEO Masayoshi Son's signature gambles alongside ByteDance and Alibaba Group Holding. The Tokyo-based company has invested about $34.6 billion in OpenAI, accumulating an 11% stake as of December, and has been in talks to invest as much as $30 billion more in a round that would value the startup at about $750 billion to $830 billion. As of December, SoftBank's investment gain on OpenAI stood at $19.8 billion, the company said Thursday.



Quantum Circuit Generation via test-time learning with large language models

arXiv.org Machine Learning

Large language models (LLMs) can generate structured artifacts, but using them as dependable optimizers for scientific design requires a mechanism for iterative improvement under black-box evaluation. Here, we cast quantum circuit synthesis as a closed-loop, test-time optimization problem: an LLM proposes edits to a fixed-length gate list, and an external simulator evaluates the resulting state with the Meyer-Wallach (MW) global entanglement measure. We introduce a lightweight test-time learning recipe that can reuse prior high-performing candidates as an explicit memory trace, augments prompts with a score-difference feedback, and applies restart-from-the-best sampling to escape potential plateaus. Across fixed 20-qubit settings, the loop without feedback and restart-from-the-best improves random initial circuits over a range of gate budgets. To lift up this performance and success rate, we use the full learning strategy. For the 25-qubit, it mitigates a pronounced performance plateau when naive querying is used. Beyond raw scores, we analyze the structure of synthesized states and find that high MW solutions can correspond to stabilizer or graph-state-like constructions, but full connectivity is not guaranteed due to the metric property and prompt design. These results illustrate both the promise and the pitfalls of memory evaluator-guided LLM optimization for circuit synthesis, highlighting the critical role of prior human-made theoretical theorems to optimally design a custom tool in support of research.


Deriving Neural Scaling Laws from the statistics of natural language

arXiv.org Machine Learning

Despite the fact that experimental neural scaling laws have substantially guided empirical progress in large-scale machine learning, no existing theory can quantitatively predict the exponents of these important laws for any modern LLM trained on any natural language dataset. We provide the first such theory in the case of data-limited scaling laws. We isolate two key statistical properties of language that alone can predict neural scaling exponents: (i) the decay of pairwise token correlations with time separation between token pairs, and (ii) the decay of the next-token conditional entropy with the length of the conditioning context. We further derive a simple formula in terms of these statistics that predicts data-limited neural scaling exponents from first principles without any free parameters or synthetic data models. Our theory exhibits a remarkable match with experimentally measured neural scaling laws obtained from training GPT-2 and LLaMA style models from scratch on two qualitatively different benchmarks, TinyStories and WikiText.