AITopics | Large Language Model

Collaborating Authors

Large Language Model

News Overviews Instructional Materials AI-Alerts Classics

AI chatbot maker Anthropic plans to raise 10bn to reach 350bn valuation

The GuardianJan-7-2026, 21:50:49 GMT

Website of Claude seen in an iPhone screen on 21 May 2023. Website of Claude seen in an iPhone screen on 21 May 2023. Anthropic is planning a $10bn fundraise that would value the Claude chatbot maker at $350bn, according to multiple reports published on Wednesday. The new valuation represents an increase of nearly double from about four months ago, per CNBC, which reported that the company had signed a term sheet that stipulated the $350bn figure. The round could close within weeks, although the size and terms could change.

ai chatbot maker anthropic plan, football newsletter business environment uk, valuation, (6 more...)

The Guardian

Country:

North America > United States (0.20)
Europe > Ukraine (0.09)
Oceania > Australia (0.06)
Asia > Singapore (0.06)

Industry:

Leisure & Entertainment > Sports (0.76)
Banking & Finance (0.71)
Government > Regional Government (0.56)

Technology:

Information Technology > Communications > Social Media (0.81)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.57)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)

Add feedback

ChatGPT is launching a new dedicated Health portal

EngadgetJan-7-2026, 21:01:50 GMT

Be cautious if you opt to use it. OpenAI is launching a new facet for its AI chatbot called ChatGPT Health . This new feature will allow users to connect medical records and wellness apps to ChatGPT in order to get more tailored responses to queries about their health. The company noted that there will be additional privacy safeguards for this separate space within ChatGPT, and said that it will not use conversations held in Health for training foundational models. ChatGPT Health is currently in a testing stage, and there are some regional restrictions on which health apps can be connected to the AI company's platform.

chatbot, chatgpt, term and privacy policy, (7 more...)

Engadget

Industry: Health & Medicine > Health Care Technology (0.58)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AI Models Are Starting to Learn by Asking Themselves Questions

WIREDJan-7-2026, 19:00:00 GMT

An AI model that learns without human input--by posing interesting queries for itself--might point the way to superintelligence. Even the smartest artificial intelligence models are essentially copycats. They learn either by consuming examples of human work or by trying to solve problems that have been set for them by human instructors. But perhaps AI can, in fact, learn in a more human way--by figuring out interesting questions to ask itself and attempting to find the right answer. A project from Tsinghua University, the Beijing Institute for General Artificial Intelligence (BIGAI), and Pennsylvania State University shows that AI can learn to reason in this way by playing with computer code.

absolute zero, ai model, university, (14 more...)

WIRED

Country:

North America > United States > Pennsylvania (0.25)
Asia > China > Beijing > Beijing (0.25)
North America > United States > North Carolina (0.05)
(5 more...)

Industry:

Information Technology (1.00)
Education (0.90)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

The Download: war in Europe, and the company that wants to cool the planet

MIT Technology ReviewJan-7-2026, 13:10:00 GMT

Plus: Amazon has listed retailers' goods without their permission Last spring, 3,000 British soldiers deployed an invisible automated intelligence network, known as a "digital targeting web," as part of a NATO exercise called Hedgehog in the damp forests of Estonia's eastern territories. The system had been cobbled together over the course of four months--an astonishing pace for weapons development, which is usually measured in years. Its purpose is to connect everything that looks for targets--"sensors," in military lingo--and everything that fires on them ("shooters") to a single, shared wireless electronic brain. Eighty years after total war last transformed the continent, the Hedgehog tests signal a brutal new calculus of European defense. But leaning too much on this new mathematics of warfare could be a risky bet. This story is from the next print issue of magazine.

download, mit technology review, technology review, (14 more...)

MIT Technology Review

Country:

Europe > Estonia (0.25)
Asia > China (0.18)
North America > United States > Mississippi (0.05)
(4 more...)

Industry:

Government > Military (0.55)
Energy > Power Industry (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.30)

Add feedback

LLMs contain a LOT of parameters. But what's a parameter?

MIT Technology ReviewJan-7-2026, 11:23:47 GMT

LLMs contain a LOT of parameters. They're the mysterious numbers that make your favorite AI models tick. What are they and what do they do? I am writing this because one of my editors woke up in the middle of the night and scribbled on a bedside notepad: "What is a parameter?" Unlike a lot of thoughts that hit at 4 a.m., it's a really good question--one that goes right to the heart of how large language models work. A large language model's parameters are often said to be the dials and levers that control how it behaves.

dimension, llm, neuron, (14 more...)

MIT Technology Review

Country:

North America > United States > Massachusetts (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Consolidate your AI apps into one platform built for creators and businesses -- now a flat 59.99

PCWorldJan-7-2026, 08:00:00 GMT

When you purchase through links in our articles, we may earn a small commission. Consolidate your AI apps into one platform built for creators and businesses -- now a flat $59.99 Get three years of access to 1min.AI's Advanced Business Plan for $59.99 (MSRP $299) and unlock a massive suite of AI tools powered by leading models like GPT-4o, Claude, Gemini, and more. Wrangling a different AI tool for every task gets messy fast -- one for writing, one for images, one for video, another for audio, and suddenly you have more subscriptions than actual productivity. That's where this $59.99 1min.AI deal steps in, giving you one platform that can do just about everything in your workflow. From content writing to SEO research to image editing to video generation, it's all baked in, powered by top models from OpenAI, Anthropic, Google, Meta, Mistral, and Cohere.

gaming laptop mobile monitor pc, mobile monitor pc, security software storage streaming wi-fi, (9 more...)

PCWorld

Country: North America > United States > California (0.05)

Industry:

Information Technology > Security & Privacy (0.61)
Information Technology > Smart Houses & Appliances (0.42)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.56)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)

Add feedback

From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence

Finzi, Marc, Qiu, Shikai, Jiang, Yiding, Izmailov, Pavel, Kolter, J. Zico, Wilson, Andrew Gordon

arXiv.org Machine LearningJan-7-2026

Can we learn more from data than existed in the generating process itself? Can new and useful information be constructed from merely applying deterministic transformations to existing data? Can the learnable content in data be evaluated without considering a downstream task? On these questions, Shannon information and Kolmogorov complexity come up nearly empty-handed, in part because they assume observers with unlimited computational capacity and fail to target the useful information content. In this work, we identify and exemplify three seeming paradoxes in information theory: (1) information cannot be increased by deterministic transformations; (2) information is independent of the order of data; (3) likelihood modeling is merely distribution matching. To shed light on the tension between these results and modern practice, and to quantify the value of data, we introduce epiplexity, a formalization of information capturing what computationally bounded observers can learn from data. Epiplexity captures the structural content in data while excluding time-bounded entropy, the random unpredictable content exemplified by pseudorandom number generators and chaotic dynamical systems. With these concepts, we demonstrate how information can be created with computation, how it depends on the ordering of the data, and how likelihood modeling can produce more complex programs than present in the data generating process itself. We also present practical procedures to estimate epiplexity which we show capture differences across data sources, track with downstream performance, and highlight dataset interventions that improve out-of-distribution generalization. In contrast to principles of model selection, epiplexity provides a theoretical foundation for data selection, guiding how to select, generate, or transform data for learning systems.

information, large language model, machine learning, (19 more...)

arXiv.org Machine Learning

2601.0322

Country: North America > United States (0.67)

Genre: Research Report (0.63)

Industry:

Education (0.92)
Information Technology > Security & Privacy (0.67)
Leisure & Entertainment > Games > Chess (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback

Chronicals: A High-Performance Framework for LLM Fine-Tuning with 3.51x Speedup over Unsloth

Nair, Arjun S.

arXiv.org Machine LearningJan-7-2026

Large language model fine-tuning is bottlenecked by memory: a 7B parameter model requires 84GB--14GB for weights, 14GB for gradients, and 56GB for FP32 optimizer states--exceeding even A100-40GB capacity. We present Chronicals, an open-source training framework achieving 3.51x speedup over Unsloth through four synergistic optimizations: (1) fused Triton kernels eliminating 75% of memory traffic via RMSNorm (7x), SwiGLU (5x), and QK-RoPE (2.3x) fusion; (2) Cut Cross-Entropy reducing logit memory from 5GB to 135MB through online softmax computation; (3) LoRA+ with theoretically-derived 16x differential learning rates between adapter matrices; and (4) Best-Fit Decreasing sequence packing recovering 60-75% of compute wasted on padding. On Qwen2.5-0.5B with A100-40GB, Chronicals achieves 41,184 tokens/second for full fine-tuning versus Unsloth's 11,736 tokens/second (3.51x). For LoRA at rank 32, we reach 11,699 tokens/second versus Unsloth MAX's 2,857 tokens/second (4.10x). Critically, we discovered that Unsloth's reported 46,000 tokens/second benchmark exhibited zero gradient norms--the model was not training. We provide complete mathematical foundations: online softmax correctness proofs, FlashAttention IO complexity bounds O(N^2 d^2 M^{-1}), LoRA+ learning rate derivations from gradient magnitude analysis, and bin-packing approximation guarantees. All implementations, benchmarks, and proofs are available at https://github.com/Ajwebdevs/Chronicals with pip installation via https://pypi.org/project/chronicals/.

large language model, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2601.02609

Country: Europe (0.27)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Detecting and Mitigating Treatment Leakage in Text-Based Causal Inference: Distillation and Sensitivity Analysis

Daoud, Adel, Johansson, Richard, Jerzak, Connor T.

arXiv.org Machine LearningJan-7-2026

Text-based causal inference increasingly employs textual data as proxies for unobserved confounders, yet this approach introduces a previously undertheorized source of bias: treatment leakage. Treatment leakage occurs when text intended to capture confounding information also contains signals predictive of treatment status, thereby inducing post-treatment bias in causal estimates. Critically, this problem can arise even when documents precede treatment assignment, as authors may employ future-referencing language that anticipates subsequent interventions. Despite growing recognition of this issue, no systematic methods exist for identifying and mitigating treatment leakage in text-as-confounder applications. This paper addresses this gap through three contributions. First, we provide formal statistical and set-theoretic definitions of treatment leakage that clarify when and why bias occurs. Second, we propose four text distillation methods -- similarity-based passage removal, distant supervision classification, salient feature removal, and iterative nullspace projection -- designed to eliminate treatment-predictive content while preserving confounder information. Third, we validate these methods through simulations using synthetic text and an empirical application examining International Monetary Fund structural adjustment programs and child mortality. Our findings indicate that moderate distillation optimally balances bias reduction against confounder retention, whereas overly stringent approaches degrade estimate precision.

information, large language model, machine learning, (22 more...)

arXiv.org Machine Learning

2601.024

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Government (1.00)
Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (0.49)
Health & Medicine > Therapeutic Area > Immunology (0.46)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
(2 more...)

Add feedback

Stand Up for Research, Innovation, and Education

MIT Technology ReviewJan-6-2026, 22:00:00 GMT

Our community is standing up for MIT and its mission to serve the nation and the world. And we need you to join us at this critical moment. This story was part of our September/October 2025 issue. We're learning more about what vitamin D does to our bodies Jessica Hamzelou OpenAI's new LLM exposes the secrets of how AI really works Will Douglas Heaven China figured out how to sell EVs. Now it has to deal with their aging batteries. We're learning more about what vitamin D does to our bodies The sunshine vitamin could affect your immune system and heart health.

great ai hype correction, innovation, share story, (8 more...)

MIT Technology Review

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.06)
Asia > China > Beijing > Beijing (0.06)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.60)

Add feedback