AITopics | tpp

Collaborating Authors

tpp

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Power Lines: Scaling Laws for Weight Decay and Batch Size in LLMPre-training

Neural Information Processing SystemsJun-22-2026, 04:12:55 GMT

Efficient LLM pre-training requires well-tuned hyperparameters (HPs), including learning rate η and weight decay λ. We study scaling laws for HPs: formulas for how to scale HPs as we scale model size N, dataset size D, and batch size B. Recent work [1] suggests the AdamW timescale, τ = B/(ηλD), should remain constant across training settings, and we verify the implication that optimal λscales linearly with B, for a fixed N and D. However, as N and Dscale, we show optimal τ obeys a precise power law in the tokens-per-parameter ratio, D/N. This law thus provides a method to accurately predict λopt in advance of large-scale training. We also study scaling laws for optimal batch size Bopt (the B enabling lowest loss at a given N,D) and critical batch size Bcrit (the B beyond which further data parallelism becomes ineffective). In contrast to prior work, we find both Bopt and Bcrit scale as power laws in D, independent of model size, N. Finally, we analyze how these findings inform the real-world selection of Pareto-optimal N and D under dual training time and compute objectives.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Energy > Power Industry (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

00ac8ed3b4327bdd4ebbebcb2ba10a00-Paper.pdf

Neural Information Processing SystemsApr-30-2026, 19:56:40 GMT

data mining, machine learning, neural information processing system, (16 more...)

Neural Information Processing Systems

Country:

North America (0.28)
Europe > Germany (0.28)

Industry:

Health & Medicine (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.94)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Communications (0.70)
(3 more...)

Add feedback

e7db14e12fb49c1d78a573e6e5f542c2-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 22:06:48 GMT

boundary, lemma 3, phase transition, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Prompt-augmented Temporal Point Process for Streaming Event Sequence Siqiao Xue, Y an Wang

Neural Information Processing SystemsFeb-10-2026, 15:45:14 GMT

In real-world applications, event data is typically received in a streaming manner, where the distribution of patterns may shift over time. Additionally, privacy and memory constraints are commonly observed in practical scenarios, further compounding the challenges.

knowledge management, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Education (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Knowledge Management (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DetectingAnomalousEventSequences withTemporalPointProcesses

Neural Information Processing SystemsFeb-9-2026, 07:27:38 GMT

For example, transactions infinancial systems, serverlogs, and user activity traces can allnaturally be represented as discrete events in continuous time.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)
North America > United States > California > Riverside County > Hemet (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.48)

Add feedback

6faa8040da20ef399b63a72d0e4ab575-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 07:27:35 GMT

detection, statistic, statistics, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

00ac8ed3b4327bdd4ebbebcb2ba10a00-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 06:59:49 GMT

international conference, neural information processing system, point process, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Industry:

Health & Medicine (0.46)
Government (0.46)

Technology:

Information Technology > Communications (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
(3 more...)

Add feedback

Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training

Bergsma, Shane, Dey, Nolan, Gosal, Gurpreet, Gray, Gavia, Soboleva, Daria, Hestness, Joel

arXiv.org Artificial IntelligenceNov-25-2025

Efficient LLM pre-training requires well-tuned hyperparameters (HPs), including learning rate $η$ and weight decay $λ$. We study scaling laws for HPs: formulas for how to scale HPs as we scale model size N, dataset size D, and batch size B. Recent work suggests the AdamW timescale, $τ= B/(ηλD)$, should remain constant across training settings, and we verify the implication that optimal $λ$ scales linearly with B, for a fixed N and D. However, as N and D scale, we show optimal $τ$ obeys a precise power law in the tokens-per-parameter ratio, D/N. This law thus provides a method to accurately predict $λ$opt in advance of large-scale training. We also study scaling laws for optimal batch size Bopt (the B enabling lowest loss at a given N,D) and critical batch size Bcrit (the B beyond which further data parallelism becomes ineffective). In contrast to prior work, we find both Bopt and Bcrit scale as power laws in D, independent of model size, N. Finally, we analyze how these findings inform the real-world selection of Pareto-optimal N and D under dual training time and compute objectives. All experiments were run on Cerebras CS-3 systems.

large language model, machine learning, natural language, (12 more...)

arXiv.org Artificial Intelligence

2505.13738

Genre: Research Report > New Finding (0.67)

Industry: Energy > Power Industry (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PTPP-Aware Adaptation Scaling Laws: Predicting Domain-Adaptation Performance at Unseen Pre-Training Budgets

Goffinet, Etienne, Bergsma, Shane, Sheinin, Avraham, Vassilieva, Natalia, Muhammad, Shaheer, Nakov, Preslav, Gosal, Gurpreet

arXiv.org Artificial IntelligenceOct-28-2025

Continual pre-training (CPT) for domain adaptation must balance target-domain gains with stability on the base domain. Existing CPT scaling laws typically assume a fixed pre-training budget, which limits their ability to forecast adaptation outcomes for models trained at different tokens-per-parameter (PTPP). We present \emph{PTPP-aware} adaptation scaling laws that make the pre-training budget an explicit variable, enabling accurate \emph{prediction} of adaptation loss at unseen \ptpp. On a multilingual setup (English/Arabic $\rightarrow$ French), PTPP-aware formulations trained on early stages (\ptpp{}=\{15,31\}) predict target loss at \ptpp{}=279 and outperform a PTPP-agnostic \dcpt{} transfer baseline on metrics (Huber-on-log, MAE$_\mathrm{rel}$, calibration slope); full diagnostics (RMSE, MAPE) are in the appendix. Beyond forecasting, we show a practical use case: planning replay ratios and adaptation token budgets that satisfy target and forgetting constraints under compute limits.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.23198

Country: Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report (0.44)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Speculative Sampling for Parametric Temporal Point Processes

Biloš, Marin, Schneider, Anderson, Nevmyvaka, Yuriy

arXiv.org Artificial IntelligenceOct-24-2025

Temporal point processes are powerful generative models for event sequences that capture complex dependencies in time-series data. They are commonly specified using autoregressive models that learn the distribution of the next event from the previous events. This makes sampling inherently sequential, limiting efficiency. In this paper, we propose a novel algorithm based on rejection sampling that enables exact sampling of multiple future values from existing TPP models, in parallel, and without requiring any architectural changes or retraining. Besides theoretical guarantees, our method demonstrates empirical speedups on real-world datasets, bridging the gap between expressive modeling and efficient parallel generation for large-scale TPP applications.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.20031

Genre: Research Report (1.00)

Industry: Banking & Finance (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback