AITopics | pareto optimal

MosaicBERT: ABidirectional Encoder Optimized for Fast Pretraining

Neural Information Processing SystemsApr-24-2026, 13:30:24 GMT

Although BERT-style encoder models are heavily used in NLP research, many researchers do not pretrain their own BERTs from scratch due to the high cost of training. In the past half-decade since BERT first rose to prominence, many advances have been made with other transformer architectures and training configurations that have yet to be systematically incorporated into BERT. Here, we introduce MosaicBERT, a BERT-style encoder architecture and training recipe that is empirically optimized for fast pretraining. This efficient architecture incorporates FlashAttention, Attention with Linear Biases (ALiBi), Gated Linear Units (GLU), a module to dynamically remove padded tokens, and low precision LayerNorm into the classic transformer encoder block. The training recipe includes a 30% masking ratio for the Masked Language Modeling (MLM) objective, bfloat16 precision, and vocabulary size optimized for GPU throughput, in addition to best-practices from RoBERTa and other encoder models. When pretrained from scratch on the C4 dataset, this base model achieves a downstream average GLUE (dev) score of 79.6 in 1.13 hours on 8 A100 80 GBGPUs at a cost of roughly $20. We plot extensive accuracy vs. pretraining speed Pareto curves and show that MosaicBERT base and large are consistently Pareto optimal when compared to a competitive BERT base and large. This empirical speed up in pretraining enables researchers and engineers to pretrain custom BERT-style models at low cost instead of finetune on existing generic models.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.94)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

7a7f6cc5dc2a84fb4edf0feb8e5cfd50-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 06:43:20 GMT

artificial intelligence, evolutionary algorithm, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(9 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.68)

Add feedback

f91bd64a3620aad8e70a27ad9cb3ca57-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 22:45:49 GMT

gradient, inequality, pareto optimal, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

f91bd64a3620aad8e70a27ad9cb3ca57-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 22:45:45 GMT

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining Jacob Portes

Neural Information Processing SystemsFeb-7-2026, 14:55:32 GMT

Although BERT -style encoder models are heavily used in NLP research, many researchers do not pretrain their own BERTs from scratch due to the high cost of training. In the past half-decade since BERT first rose to prominence, many advances have been made with other transformer architectures and training configurations that have yet to be systematically incorporated into BERT.

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Hampshire > Southampton (0.04)

Genre: Research Report > New Finding (0.94)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Maximin Relative Improvement: Fair Learning as a Bargaining Problem

Han, Jiwoo, Banerjee, Moulinath, Sun, Yuekai

arXiv.org Machine LearningFeb-5-2026

When deploying a single predictor across multiple subpopulations, we propose a fundamentally different approach: interpreting group fairness as a bargaining problem among subpopulations. This game-theoretic perspective reveals that existing robust optimization methods such as minimizing worst-group loss or regret correspond to classical bargaining solutions and embody different fairness principles. We propose relative improvement, the ratio of actual risk reduction to potential reduction from a baseline predictor, which recovers the Kalai-Smorodinsky solution. Unlike absolute-scale methods that may not be comparable when groups have different potential predictability, relative improvement provides axiomatic justification including scale invariance and individual monotonicity. We establish finite-sample convergence guarantees under mild conditions.

artificial intelligence, machine learning, relative improvement, (17 more...)

arXiv.org Machine Learning

2602.04155

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learnable Mixed Nash Equilibria are Collectively Rational

So, Geelon, Ma, Yi-An

arXiv.org Artificial IntelligenceOct-17-2025

We extend the study of learning in games to dynamics that exhibit non-asymptotic stability. We do so through the notion of uniform stability, which is concerned with equilibria of individually utility-seeking dynamics. Perhaps surprisingly, it turns out to be closely connected to economic properties of collective rationality. Under mild non-degeneracy conditions and up to strategic equivalence, if a mixed equilibrium is not uniformly stable, then it is not weakly Pareto optimal: there is a way for all players to improve by jointly deviating from the equilibrium. On the other hand, if it is locally uniformly stable, then the equilibrium must be weakly Pareto optimal. Moreover, we show that uniform stability determines the last-iterate convergence behavior for the family of incremental smoothed best-response dynamics, used to model individual and corporate behaviors in the markets. Unlike dynamics around strict equilibria, which can stabilize to socially-inefficient solutions, individually utility-seeking behaviors near mixed Nash equilibria lead to collective rationality.

artificial intelligence, equilibria, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.14907

Country: North America > United States (0.45)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Prediction-Specific Design of Learning-Augmented Algorithms

Li, Sizhe, Christianson, Nicolas, Li, Tongxin

arXiv.org Artificial IntelligenceOct-17-2025

Algorithms with predictions} has emerged as a powerful framework to combine the robustness of traditional online algorithms with the data-driven performance benefits of machine-learned (ML) predictions. However, most existing approaches in this paradigm are overly conservative, {as they do not leverage problem structure to optimize performance in a prediction-specific manner}. In this paper, we show that such prediction-specific performance criteria can enable significant performance improvements over the coarser notions of consistency and robustness considered in prior work. Specifically, we propose a notion of \emph{strongly-optimal} algorithms with predictions, which obtain Pareto optimality not just in the worst-case tradeoff between robustness and consistency, but also in the prediction-specific tradeoff between these metrics. We develop a general bi-level optimization framework that enables systematically designing strongly-optimal algorithms in a wide variety of problem settings, and we propose explicit strongly-optimal algorithms for several classic online problems: deterministic and randomized ski rental, and one-max search. Our analysis reveals new structural insights into how predictions can be optimally integrated into online algorithms by leveraging a prediction-specific design. To validate the benefits of our proposed framework, we empirically evaluate our algorithms in case studies on problems including dynamic power management and volatility-based index trading. Our results demonstrate that prediction-specific, strongly-optimal algorithms can significantly improve performance across a variety of online decision-making settings.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.14887

Genre: Research Report > New Finding (0.85)

Industry: