Goto

Collaborating Authors

 Media


Chris Christie tells Bill Maher that Republicans talk very differently about Donald Trump behind closed doors

FOX News

Chris Christie claimed Sen. Lindsey Graham privately criticizes President Donald Trump despite publicly praising him, during a Friday appearance on "Real Time with Bill Maher."



'Melania' film exposes massive divide as audience score hits 99 percent despite rigging claims

FOX News

Rotten Tomatoes denies claims the โ€œMelania" film's 99% audience score was inflated, as the film shows an over 93-point gap between audience and critics.




MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining Jacob Portes

Neural Information Processing Systems

Although BERT -style encoder models are heavily used in NLP research, many researchers do not pretrain their own BERTs from scratch due to the high cost of training. In the past half-decade since BERT first rose to prominence, many advances have been made with other transformer architectures and training configurations that have yet to be systematically incorporated into BERT.



Weak-to-StrongSearch: AlignLargeLanguageModelsvia SearchingoverSmallLanguageModels

Neural Information Processing Systems

Large language models are usually fine-tuned to align with human preferences. However, fine-tuning a large language model can be challenging. In this work, we introduceweak-to-strong search, framing the alignment of a large language model as a test-time greedy search to maximize the log-probability difference between small tuned and untuned models while sampling from the frozen large model. This method serves both as (1) a compute-efficient model up-scaling strategy that avoids directly tuning the large model and as (2) an instance of weak-to-strong generalization thatenhances astrong model with weak test-time guidance.



LanguageModelsareFew-ShotLearners

Neural Information Processing Systems

Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous nonsparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks andfew-shot demonstrations specified purelyviatextinteraction withthemodel.