AITopics | Large Language Model

Collaborating Authors

Large Language Model

News Overviews Instructional Materials AI-Alerts Classics

'Architects of AI' named Time Magazine's Person of the Year

BBC NewsDec-11-2025, 13:52:11 GMT

'Architects of AI' named Time Magazine's Person of the Year Time Magazine's Person of the Year for 2025 is not a single person. Instead, the magazine has recognised the year's most influential figure as the architects of artificial intelligence (AI). Nvidia boss Jensen Huang, Meta head Mark Zuckerberg, X owner Elon Musk and AI godmother Fei-Fei Li are among those depicted on one of the magazine's two covers. Experts say it highlights how quickly AI, and the firms behind it, are reshaping society. It comes as a boom in the technology, ushered in by OpenAI's launch of ChatGPT in late 2022, continues at pace.

large language model, machine learning, natural language, (17 more...)

BBC News

Country:

North America > United States (0.30)
North America > Central America (0.15)
Oceania > Australia (0.05)
(14 more...)

Industry:

Leisure & Entertainment (0.73)
Information Technology (0.70)
Media (0.50)
Government > Regional Government > Europe Government > United Kingdom Government (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.90)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

The Download: solar geoengineering's future, and OpenAI is being sued

MIT Technology ReviewDec-11-2025, 13:10:00 GMT

The Download: solar geoengineering's future, and OpenAI is being sued Solar geoengineering aims to manipulate the climate by bouncing sunlight back into space. In theory, it could ease global warming. But as interest in the idea grows, so do concerns about potential consequences. A startup called Stardust Solutions recently raised a $60 million funding round, the largest known to date for a geoengineering startup. My colleague James Temple has a new story out about the company, and how its emergence is making some researchers nervous. So far, the field has been limited to debates, proposed academic research, and--sure--a few fringe actors to keep an eye on.

large language model, machine learning, natural language, (18 more...)

MIT Technology Review

Country:

Asia > Russia (0.15)
Asia > China (0.06)
North America > United States > New York (0.05)
(4 more...)

Industry: Media > News (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.63)

Add feedback

Google's Gemini AI comes to Chrome on iPhone and iPad

EngadgetDec-11-2025, 13:00:03 GMT

GPU prices could follow RAM's big rise Google's Gemini AI comes to Chrome on iPhone and iPad It can summarize pages, create a FAQ on a topic and modify recipes for your dietary needs. After rolling it out on desktop and Android earlier in 2025, Google is finally bringing its built-in Gemini AI experience to iPhone and iPad. It offers new features like summarizing pages and helping you test your knowledge about a subject you're learning. As with any AI tool, though, it shouldn't be trusted for anything important given the possibility of hallucinations and other errors. When it arrives on your iOS device, tapping the spark icon at the left of the address bar (in place of the Google Lens camera) brings up a Pages tool that offers Lens and the new feature, Ask Gemini.

large language model, machine learning, natural language, (14 more...)

Engadget

Country: North America > United States (0.06)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Story Behind TIME's 2025 Person of the Year Covers

TIME - TechDec-11-2025, 12:44:00 GMT

Pine is the Creative Director at TIME. To illustrate the choice of the Architects of AI as TIME's 2025 Person of the Year, we asked two separate artists to help us visualize the incredibly complex technological revolution that is currently underway. London-based illustrator and graphics animator Peter Crowther and digital painter Jason Seiler each created an image that speaks to the duality AI has produced - man vs. machine. Inspired by the inner workings of computer chips, Crowther's intricate AI structure looms large over the busy construction site.

large language model, machine learning, natural language, (19 more...)

TIME - Tech

Country:

North America > United States > New York (0.05)
North America > United States > Illinois > Cook County > Chicago (0.05)

Genre: Personal > Honors (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)

Add feedback

Heirs of mother strangled by son accuse ChatGPT of making him delusional in lawsuit against OpenAI, Microsoft

FOX NewsDec-11-2025, 12:19:22 GMT

The heirs of an 83-year-old Connecticut mother who was killed by her son in August filed a lawsuit against OpenAI and Microsoft, claiming ChatGPT amplified his "paranoid delusions."

large language model, lawsuit, machine learning, (15 more...)

FOX News

Country:

North America > United States > Connecticut (0.26)
North America > United States > North Carolina (0.04)
North America > United States > New York (0.04)
(2 more...)

Industry:

Media (1.00)
Leisure & Entertainment > Sports (1.00)
Law > Litigation (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.67)

Add feedback

A Minimalist Optimizer Design for LLM Pretraining

Glentis, Athanasios, Li, Jiaxiang, Han, Andi, Hong, Mingyi

arXiv.org Artificial IntelligenceDec-11-2025

Training large language models (LLMs) typically relies on adaptive optimizers such as Adam, which introduce extra operations and require significant more memory to maintain first- and second-order moments than SGD. While recent works such as GaLore, Fira and APOLLO have proposed state-compressed variants to reduce memory consumption, a fundamental question remains: What are the minimum modifications to plain SGD needed to match state-of-the-art pretraining performance? We systematically investigate this question using a bottom-up approach, and identify two simple yet highly (memory- and compute-) efficient techniques: (1) column-wise gradient normalization (normalizing the gradient along the output dimension), which boosts SGD performance without momentum; and (2) applying first-order momentum only to the output layer, where gradient variance is highest. Combining these two techniques lead to SCALE (Stochastic Column-normAlized Last-layer momEntum), a simple optimizer for memory efficient pretraining. Across multiple LLaMA models (60M-1B), SCALE matches or exceeds the performance of Adam while using only 35-45% of the total memory. It also consistently outperforms memory-efficient optimizers such as GaLore, Fira and APOLLO, making it a strong candidate for large-scale pretraining under memory constraints. For LLaMA 7B model, SCALE outperforms the state-of-the-art memory-efficient methods APOLLO and Muon, in terms of both perplexity and memory consumption.

large language model, machine learning, normalization, (20 more...)

arXiv.org Artificial Intelligence

2506.16659

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Minnesota (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Don't Throw Away Your Beams: Improving Consistency-based Uncertainties in LLMs via Beam Search

Fadeeva, Ekaterina, Goloburda, Maiya, Rubashevskii, Aleksandr, Vashurin, Roman, Shelmanov, Artem, Nakov, Preslav, Sachan, Mrinmaya, Panov, Maxim

arXiv.org Machine LearningDec-11-2025

Consistency-based methods have emerged as an effective approach to uncertainty quantification (UQ) in large language models. These methods typically rely on several generations obtained via multinomial sampling, measuring their agreement level. However, in short-form QA, multinomial sampling is prone to producing duplicates due to peaked distributions, and its stochasticity introduces considerable variance in uncertainty estimates across runs. We introduce a new family of methods that employ beam search to generate candidates for consistency-based UQ, yielding improved performance and reduced variance compared to multinomial sampling. We also provide a theoretical lower bound on the beam set probability mass under which beam search achieves a smaller error than multinomial sampling. We empirically evaluate our approach on six QA datasets and find that its consistent improvements over multinomial sampling lead to state-of-the-art UQ performance.

beamsearch, computational linguistic, dataset, (13 more...)

arXiv.org Machine Learning

2512.09538

Country:

Europe > Austria > Vienna (0.14)
Europe > Middle East > Cyprus (0.04)
South America > Suriname > Marowijne District > Albina (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.59)

Add feedback

Impact of Positional Encoding: Clean and Adversarial Rademacher Complexity for Transformers under In-Context Regression

He, Weiyi, Xing, Yue

arXiv.org Machine LearningDec-11-2025

Positional encoding (PE) is a core architectural component of Transformers, yet its impact on the Transformer's generalization and robustness remains unclear. In this work, we provide the first generalization analysis for a single-layer Transformer under in-context regression that explicitly accounts for a completely trainable PE module. Our result shows that PE systematically enlarges the generalization gap. Extending to the adversarial setting, we derive the adversarial Rademacher generalization bound. We find that the gap between models with and without PE is magnified under attack, demonstrating that PE amplifies the vulnerability of models. Our bounds are empirically validated by a simulation study. Together, this work establishes a new framework for understanding the clean and adversarial generalization in ICL with PE.

complexity, theorem 4, transformer, (12 more...)

arXiv.org Machine Learning

2512.09275

Country:

North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.86)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Utility Boundary of Dataset Distillation: Scaling and Configuration-Coverage Laws

Luo, Zhengquan, Xu, Zhiqiang

arXiv.org Artificial IntelligenceDec-11-2025

Dataset distillation (DD) aims to construct compact synthetic datasets that allow models to achieve comparable performance to full-data training while substantially reducing storage and computation. Despite rapid empirical progress, its theoretical foundations remain limited: existing methods (gradient, distribution, trajectory matching) are built on heterogeneous surrogate objectives and optimization assumptions, which makes it difficult to analyze their common principles or provide general guarantees. Moreover, it is still unclear under what conditions distilled data can retain the effectiveness of full datasets when the training configuration, such as optimizer, architecture, or augmentation, changes. To answer these questions, we propose a unified theoretical framework, termed configuration--dynamics--error analysis, which reformulates major DD approaches under a common generalization-error perspective and provides two main results: (i) a scaling law that provides a single-configuration upper bound, characterizing how the error decreases as the distilled sample size increases and explaining the commonly observed performance saturation effect; and (ii) a coverage law showing that the required distilled sample size scales linearly with configuration diversity, with provably matching upper and lower bounds. In addition, our unified analysis reveals that various matching methods are interchangeable surrogates, reducing the same generalization error, clarifying why they can all achieve dataset distillation and providing guidance on how surrogate choices affect sample efficiency and robustness. Experiments across diverse methods and configurations empirically confirm the derived laws, advancing a theoretical foundation for DD and enabling theory-driven design of compact, configuration-robust dataset distillation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2512.05817

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Token Expand-Merge: Training-Free Token Compression for Vision-Language-Action Models

Ye, Yifan, Ma, Jiaqi, Cen, Jun, Lu, Zhihe

arXiv.org Artificial IntelligenceDec-11-2025

Vision-Language-Action (VLA) models pretrained on large-scale multimodal datasets have emerged as powerful foundations for robotic perception and control. However, their massive scale, often billions of parameters, poses significant challenges for real-time deployment, as inference becomes computationally expensive and latency-sensitive in dynamic environments. To address this, we propose Token Expand-and-Merge-VLA (TEAM-VLA), a training-free token compression framework that accelerates VLA inference while preserving task performance. TEAM-VLA introduces a dynamic token expansion mechanism that identifies and samples additional informative tokens in the spatial vicinity of attention-highlighted regions, enhancing contextual completeness. These expanded tokens are then selectively merged in deeper layers under action-aware guidance, effectively reducing redundancy while maintaining semantic coherence. By coupling expansion and merging within a single feed-forward pass, TEAM-VLA achieves a balanced trade-off between efficiency and effectiveness, without any retraining or parameter updates. Extensive experiments on LIBERO benchmark demonstrate that TEAM-VLA consistently improves inference speed while maintaining or even surpassing the task success rate of full VLA models. The code is public available on \href{https://github.com/Jasper-aaa/TEAM-VLA}{https://github.com/Jasper-aaa/TEAM-VLA}

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2512.09927

Country: Asia > Middle East (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback