infant
- Health & Medicine > Therapeutic Area > Neurology (0.94)
- Education (0.93)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Baby Intuitions Benchmark (BIB): Discerning the goals, preferences, and actions of others
To achieve human-like common sense about everyday life, machine learning systems must understand and reason about the goals, preferences, and actions of other agents in the environment. By the end of their first year of life, human infants intuitively achieve such common sense, and these cognitive achievements lay the foundation for humans' rich and complex understanding of the mental states of others. Can machines achieve generalizable, commonsense reasoning about other agents like human infants?
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
Baby chimpanzees like to free fall through trees
Chimp infants are three times more likely to take risks than adults. Breakthroughs, discoveries, and DIY tips sent six days a week. Given the many similarities between humans and chimpanzees, one might assume that both species similarly engage in risky behavior within the same age range. However, according to a study recently published in the journal, it turns out that in chimps, it's the infants you have to watch out for. After studying videos of 119 wild chimpanzees, researchers found that chimpanzees' risky behavior peaks in their infancy, and then lessens as they get older.
- North America > United States > Michigan (0.05)
- North America > Greenland (0.05)
- Asia > Thailand (0.05)
Opinion: Learning Intuitive Physics May Require More than Visual Data
Su, Ellen, Legris, Solim, Gureckis, Todd M., Ren, Mengye
Humans expertly navigate the world by building rich internal models founded on an intuitive understanding of physics. Meanwhile, despite training on vast quantities of internet video data, state-of-the-art deep learning models still fall short of human-level performance on intuitive physics benchmarks. This work investigates whether data distribution, rather than volume, is the key to learning these principles. We pretrain a Video Joint Embedding Predictive Architecture (V-JEPA) model on SAYCam, a developmentally realistic, egocentric video dataset partially capturing three children's everyday visual experiences. We find that training on this dataset, which represents 0.01% of the data volume used to train SOTA models, does not lead to significant performance improvements on the IntPhys2 benchmark. Our results suggest that merely training on a developmentally realistic dataset is insufficient for current architectures to learn representations that support intuitive physics. We conclude that varying visual data volume and distribution alone may not be sufficient for building systems with artificial intuitive physics.
Assessing the alignment between infants' visual and linguistic experience using multimodal language models
Tan, Alvin Wei Ming, Yang, Jane, Sepuri, Tarun, Aw, Khai Loong, Sparks, Robert Z., Yin, Zi, Marchman, Virginia A., Frank, Michael C., Long, Bria
Figuring out which objects or concepts words refer to is a central language learning challenge for young children. Most models of this process posit that children learn early object labels from co-occurrences of words and their referents that occur when someone around them talks about an object in the immediate physical environment. But how aligned in time are children's visual and linguistic experiences during everyday learning? To date, answers to this question have been limited by the need for labor-intensive manual annotations of vision-language co-occurrences. Here, we evaluate the use of contrastive language-image pretraining (CLIP) models to automatically characterize vision-language alignment in egocentric videos taken from the infant perspective in home environments. After validating CLIP alignment scores using human alignment judgments, we apply this metric to a large corpus of infant-perspective videos. We show that idealized aligned moments for learning (e.g., "look at the ball" with a ball present in the child's view) are relatively rare in children's everyday experiences compared to modern machine learning datasets, and highlight variability in alignment both within and across children. These findings suggest that infrequent alignment is a constraint for models describing early word learning and offer a new method for investigating children's multimodal environment.
- North America > United States > Virginia (0.04)
- North America > United States > Ohio (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (3 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > Strength High (0.46)
- Health & Medicine > Therapeutic Area (0.68)
- Education > Educational Setting (0.47)
Chimpanzees' brutal battle for territory leads to a baby boom
Chimpanzees' brutal battle for territory leads to a baby boom A rival chimp can die in less than 15 minutes during these deadly territorial fights. New research led by UCLA and the University of Michigan has shown that chimp communities that kill their neighbors to gain territory also gain reproductive advantages. Breakthroughs, discoveries, and DIY tips sent every weekday. Uganda's Ngogo chimpanzees are well known for their "chimpanzee warfare." Primatologists have observed their brutal, lethal fights between 10 or more chimpanzees for decades, deciphering what leads to such violence.
- North America > United States > Michigan (0.25)
- North America > United States > California > Los Angeles County > Los Angeles (0.15)
- North America > United States > New Jersey (0.05)
- (4 more...)
Baby Sophia: A Developmental Approach to Self-Exploration through Self-Touch and Hand Regard
Zarifis, Stelios, Chalkiadakis, Ioannis, Chardouveli, Artemis, Moutzouri, Vasiliki, Sotirchos, Aggelos, Papadimitriou, Katerina, Filntisis, Panagiotis, Efthymiou, Niki, Maragos, Petros, Pastra, Katerina
Inspired by infant development, we propose a Reinforcement Learning (RL) framework for autonomous self-exploration in a robotic agent, Baby Sophia, using the BabyBench simulation environment. The agent learns self-touch and hand regard behaviors through intrinsic rewards that mimic an infant's curiosity-driven exploration of its own body. For self-touch, high-dimensional tactile inputs are transformed into compact, meaningful representations, enabling efficient learning. The agent then discovers new tactile contacts through intrinsic rewards and curriculum learning that encourage broad body coverage, balance, and generalization. For hand regard, visual features of the hands, such as skin-color and shape, are learned through motor babbling. Then, intrinsic rewards encourage the agent to perform novel hand motions, and follow its hands with its gaze. A curriculum learning setup from single-hand to dual-hand training allows the agent to reach complex visual-motor coordination. The results of this work demonstrate that purely curiosity-based signals, with no external supervision, can drive coordinated multimodal learning, imitating an infant's progression from random motor babbling to purposeful behaviors.
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
Wang, Shengao, Chandra, Arjun, Liu, Aoming, Saligrama, Venkatesh, Gong, Boqing
Human infants rapidly develop visual reasoning skills from minimal input, suggesting that developmentally inspired pretraining could significantly enhance the efficiency of vision-language models (VLMs). Although recent efforts have leveraged infant-inspired datasets like SAYCam, existing evaluation benchmarks remain misaligned--they are either too simplistic, narrowly scoped, or tailored for large-scale pretrained models. Additionally, training exclusively on infant data overlooks the broader, diverse input from which infants naturally learn. To address these limitations, we propose BabyVLM, a novel framework comprising comprehensive in-domain evaluation benchmarks and a synthetic training dataset created via child-directed transformations of existing datasets. We demonstrate that VLMs trained with our synthetic dataset achieve superior performance on BabyVLM tasks compared to models trained solely on SAYCam or general-purpose data of the SAYCam size. BabyVLM thus provides a robust, developmentally aligned evaluation tool and illustrates how compact models trained on carefully curated data can generalize effectively, opening pathways toward data-efficient vision-language learning paradigms.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Virginia (0.04)
- Asia > Middle East > Jordan (0.04)