context length
DeepSeek promises its new AI model has 'world-class' reasoning
DeepSeek promises its new AI model has'world-class' reasoning The new models give users access to a'cost effective 1 million context length.' DeepSeek has released its latest AI models, the V4 Pro and Flash versions, a bit over a year after it went viral and became the top rated free app on Apple's App Store in the US. "Welcome to the era of cost-effective 1 million context length," DeepSeek said in its announcement . Context length is what you call the maximum number of tokens that an AI model can remember, so the bigger it is, the more coherent and consistent an AI is when it comes to extended conversations. OpenAI's recently announced GPT 5.5 has a context window ranging from 400,000 to 1 million, for instance.
- North America > United States (0.25)
- Asia > South Korea (0.05)
- Marketing (0.45)
- Semiconductors & Electronics (0.31)
In-Place Test-Time Training
Feng, Guhao, Luo, Shengjie, Hua, Kai, Zhang, Ge, He, Di, Huang, Wenhao, Cai, Tianle
The static ``train then deploy" paradigm fundamentally limits Large Language Models (LLMs) from dynamically adapting their weights in response to continuous streams of new information inherent in real-world tasks. Test-Time Training (TTT) offers a compelling alternative by updating a subset of model parameters (fast weights) at inference time, yet its potential in the current LLM ecosystem is hindered by critical barriers including architectural incompatibility, computational inefficiency and misaligned fast weight objectives for language modeling. In this work, we introduce In-Place Test-Time Training (In-Place TTT), a framework that seamlessly endows LLMs with Test-Time Training ability. In-Place TTT treats the final projection matrix of the ubiquitous MLP blocks as its adaptable fast weights, enabling a ``drop-in" enhancement for LLMs without costly retraining from scratch. Furthermore, we replace TTT's generic reconstruction objective with a tailored, theoretically-grounded objective explicitly aligned with the Next-Token-Prediction task governing autoregressive language modeling. This principled objective, combined with an efficient chunk-wise update mechanism, results in a highly scalable algorithm compatible with context parallelism. Extensive experiments validate our framework's effectiveness: as an in-place enhancement, it enables a 4B-parameter model to achieve superior performance on tasks with contexts up to 128k, and when pretrained from scratch, it consistently outperforms competitive TTT-related approaches. Ablation study results further provide deeper insights on our design choices. Collectively, our results establish In-Place TTT as a promising step towards a paradigm of continual learning in LLMs.
- Information Technology (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Genetic Disease (0.68)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
- Law Enforcement & Public Safety (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- (2 more...)
- North America > United States (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > United Kingdom (0.04)
- Europe > Croatia > Primorje-Gorski Kotar County > Rijeka (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
- Information Technology > Communications (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Cognitive Science (0.67)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Austria > Upper Austria > Linz (0.04)
- Oceania > Australia (0.04)
- (5 more...)
- Health & Medicine (1.00)
- Media > News (0.45)
- Leisure & Entertainment > Games (0.45)
- North America > United States (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
- Asia > Russia (0.14)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Singapore (0.04)
- (9 more...)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- (3 more...)
- Asia > Middle East > Jordan (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > New Jersey (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Workflow (0.68)
- Leisure & Entertainment > Games (0.46)
- Education > Educational Setting (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (4 more...)