Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning)
Padovani, Francesca, Bunzeck, Bastian, Ali, Manar, Momen, Omar, Bisazza, Arianna, Buschmeier, Hendrik, Zarrieß, Sina
–arXiv.org Artificial Intelligence
We investigate whether pre-training exclusively on dialogue data results in formally and functionally apt small language models. Based on this pre-trained llamalogue model, we employ a variety of fine-tuning strategies to enforce "more communicative" text generations by our models. Although our models underperform on most standard BabyLM benchmarks, they excel at dialogue continuation prediction in a minimal pair setting. While PPO fine-tuning has mixed to adversarial effects on our models, DPO fine-tuning further improves their performance on our custom dialogue benchmark.
arXiv.org Artificial Intelligence
Dec-2-2025
- Country:
- Asia
- Europe
- Austria > Vienna (0.14)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Netherlands > South Holland
- Rotterdam (0.04)
- Sweden > Vaestra Goetaland
- Gothenburg (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Dominican Republic (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.14)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New Jersey > Bergen County
- Mahwah (0.04)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Florida > Miami-Dade County
- Canada
- Genre:
- Research Report > New Finding (0.46)
- Technology: