AITopics | tinystory

Country:

North America > United States (0.28)
North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Quebec (0.04)
(4 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-17-2026, 13:14:19 GMT

b1c446eebd9a317dd0e96b16908c821a-Paper-Conference.pdf

large language model, machine learning, natural language, (20 more...)

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Neural Information Processing SystemsFeb-17-2026, 10:56:30 GMT

A distributional simplicity bias in the learning dynamics of transformers

The remarkable capability of over-parameterised neural networks to generalise effectively has been explained by invoking a "simplicity bias": neural networks prevent overfitting by initially learning simple classifiers before progressing to

large language model, machine learning, natural language, (22 more...)

Country:

North America > Canada > British Columbia > Vancouver (0.04)
Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Lee, Ivan, Berg-Kirkpatrick, Taylor

Readability $\ne$ Learnability: Rethinking the Role of Simplicity in Training Small Language Models

arXiv.org Artificial IntelligenceOct-17-2025

Recent studies suggest that very small language models (SLMs) can generate surprisingly coherent text when trained on simplified, child-directed corpora such as TinyStories. These findings have been interpreted as evidence that readability -- characterized by accessible vocabulary, familiar narrative structure, and simple syntax -- plays a key role in enabling such capabilities to emerge. In this paper, we challenge that interpretation. We construct synthetic datasets with matched structure but varied readability, and find that readability alone does not predict coherence or learning efficiency in SLMs. Models trained on complex, adult-level text perform comparably to those trained on simplified language, and even exhibit faster development of coherence during training. Instead, we show that statistical simplicity, as measured by n-gram diversity, is a stronger predictor of learnability. Our findings caution against the growing trend of anthropomorphizing language model training -- drawing parallels to human cognitive development without empirical basis -- and argue for more precise reasoning about what properties actually support capability emergence in small models.

large language model, machine learning, natural language, (19 more...)

2510.13915

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.14)
Asia > Russia (0.14)
North America > United States > New Hampshire (0.04)
(29 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media (1.00)
Leisure & Entertainment > Sports (1.00)
Law (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Communications > Social Media (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Neural Information Processing SystemsOct-10-2025, 21:54:54 GMT

Linguistic Collapse: Neural Collapse in (Large) Language Models

Convergence to a simplex ETF: Class means tend towards equinorm and equiangular vectors when centred about the global average.

arxiv, generalization, neural collapse, (16 more...)

Country:

North America > United States (0.28)
North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Quebec (0.04)
(4 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-10-2025, 13:38:51 GMT

b1c446eebd9a317dd0e96b16908c821a-Paper-Conference.pdf

optimal rule, prediction, variational distance, (16 more...)

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Neural Information Processing SystemsOct-10-2025, 13:13:34 GMT

A distributional simplicity bias in the learning dynamics of transformers

clone, interaction, transformer, (17 more...)

Country:

North America > Canada > British Columbia > Vancouver (0.04)
Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

arXiv.org Artificial IntelligenceOct-8-2025

Measuring LLM Novelty As The Frontier Of Original And High-Quality Output

Padmakumar, Vishakh, Yueh-Han, Chen, Pan, Jane, Chen, Valerie, He, He

As large language models (LLMs) are increasingly used for ideation and scientific discovery, it is important to evaluate their ability to generate novel output. Prior work evaluates novelty as originality with respect to model training data, but original outputs may be of low quality. In contrast, non-expert judges more reliably score quality but may favor memorized outputs, limiting the reliability of human preference as a metric. We introduce a new novelty metric for LLM generations that balances originality and quality -- the harmonic mean of the fraction of \ngrams unseen during training and a task-specific quality score. Using this framework, we identify trends that affect the novelty of generations from three families of open-data models (OLMo, OLMo-2, and Pythia) on three creative tasks: story completion, poetry writing, and creative tool use. We find that model-generated text from some base LLMs is less novel than human-written text from the internet. However, increasing model scale and post-training reliably improves novelty due to improvements in output quality. We also find that improving the base model at the same scale (\eg OLMo 7B to OLMo-2 7B) leads to higher novelty due to higher originality. Finally, we observe that inference-time methods, such as prompting and providing novel in-context examples, have a much smaller effect on novelty, often increasing originality at the expense of quality. This highlights the need for further research into more effective elicitation strategies as we use models for creative applications.

artificial intelligence, large language model, natural language, (19 more...)

2504.09389

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (1.00)
Overview (0.68)

Industry:

Health & Medicine (1.00)
Consumer Products & Services > Personal Products (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceJun-3-2025

Parameterized Synthetic Text Generation with SimpleStories

Finke, Lennart, Sreedhara, Chandan, Dooms, Thomas, Allen, Mat, Zhang, Emerald, Rodriguez, Juan Diego, Nabeshima, Noa, Marshall, Thomas, Braun, Dan

We present SimpleStories, a large synthetic story dataset in simple language, consisting of 2 million samples each in English and Japanese. Through parameterizing prompts at multiple levels of abstraction, we achieve control over story characteristics at scale, inducing syntactic and semantic diversity. Ablations on a newly trained model suite show improved sample efficiency and model interpretability compared to the TinyStories dataset. We open-source all constituent parts of model creation, hoping to enable novel ways to study the end-to-end training process. As a byproduct, we move the frontier regarding the fewest-parameter language model that outputs grammatical natural language.

large language model, machine learning, natural language, (17 more...)

2504.09184

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Belgium > Flanders > Antwerp Province > Antwerp (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.30)

Theodoropoulos, Nikitas, Filandrianos, Giorgos, Lyberatos, Vassilis, Lymperaiou, Maria, Stamou, Giorgos

BERTtime Stories: Investigating the Role of Synthetic Story Data in Language pre-training

arXiv.org Artificial IntelligenceDec-8-2024

We describe our contribution to the Strict and Strict-Small tracks of the 2nd iteration of the BabyLM Challenge. The shared task is centered around efficient pre-training given data constraints motivated by human development. In response, we study the effect of synthetic story data in language pre-training using TinyStories: a recently introduced dataset of short stories. Initially, we train GPT-Neo models on subsets of TinyStories, while varying the amount of available data. We find that, even with access to less than 100M words, the models are able to generate high-quality, original completions to a given story, and acquire substantial linguistic knowledge. To measure the effect of synthetic story data, we train LTG-BERT encoder models on a combined dataset of: a subset of TinyStories, story completions generated by GPT-Neo, and a subset of the BabyLM dataset. Our experimentation reveals that synthetic data can occasionally offer modest gains, but overall have a negative influence on linguistic understanding. Our work offers an initial study on synthesizing story data in low resource settings and underscores their potential for augmentation in data-constrained language modeling. We publicly release our models and implementation on our GitHub.

large language model, machine learning, natural language, (21 more...)