macwhinney
UD-English-CHILDES: A Collected Resource of Gold and Silver Universal Dependencies Trees for Child Language Interactions
Yang, Xiulin, Ju, Zhuoxuan, Bu, Lanni, Liu, Zoey, Schneider, Nathan
CHILDES is a widely used resource of transcribed child and child-directed speech. This paper introduces UD-English-CHILDES, the first officially released Universal Dependencies (UD) treebank. It is derived from previously dependency-annotated CHILDES data, which we harmonize to follow unified annotation principles. The gold-standard trees encompass utterances sampled from 11 children and their caregivers, totaling over 48K sentences (236K tokens). We validate these gold-standard annotations under the UD v2 framework and provide an additional 1M~silver-standard sentences, offering a consistent resource for computational and linguistic research.
- Europe > Czechia > Prague (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- (6 more...)
Morphosyntactic Analysis for CHILDES
Liu, Houjun, MacWhinney, Brian
Language development researchers are interested in comparing the process of language learning across languages. Unfortunately, it has been difficult to construct a consistent quantitative framework for such comparisons. However, recent advances in AI (Artificial Intelligence) and ML (Machine Learning) are providing new methods for ASR (automatic speech recognition) and NLP (natural language processing) that can be brought to bear on this problem. Using the Batchalign2 program (Liu et al., 2023), we have been transcribing and linking data for the CHILDES database and have applied the UD (Universal Dependencies) framework to provide a consistent and comparable morphosyntactic analysis for 27 languages. These new resources open possibilities for deeper crosslinguistic study of language learning.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > Michigan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (7 more...)
- Health & Medicine (1.00)
- Education > Curriculum > Subject-Specific Education (0.45)
Editorial: Alzheimer's Dementia Recognition through Spontaneous Speech
While a number of studies have investigated speech and language features for the detection of AD and mild cognitive impairment (Fraser et al., 2016), and proposed various signal processing and machine learning methods for this task (Petti et al., 2020), the field still lacks balanced benchmark data against which different approaches can be systematically compared. This Research Topic addresses this issue by exploring the use of speech characteristics for AD recognition using balanced data and shared tasks, such as those provided by the ADReSS Challenges (Luz et al., 2020(Luz et al., , 2021. These tasks have brought together groups working on this active area of research, providing the community with benchmarks for comparison of speech and language approaches to cognitive assessment. Reflecting the multidisciplinary character of the topic, the articles in this collection span three journals: Frontiers of Aging Neuroscience, Frontiers of Computer Science and Frontiers in Psychology.Most papers in this Reseach Topic target two main tasks: AD classification, for distinguishing individuals with AD from healthy controls, and cognitive test score regression, to infer the patient's Mini Mental Status Examination (MMSE) score (Folstein et al., 1975). Of the twenty papers published in this collection, 14 used the ADReSS dataset (Luz et al., 2020), by itself or in combination with other data. The ADReSS dataset is a curated subset of DementiaBank's Pitt Corpus, matched for age and ge...
- North America > United States > Wisconsin (0.05)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area > Neurology > Dementia (0.53)
- Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.35)
MUTLA: A Large-Scale Dataset for Multimodal Teaching and Learning Analytics
Xu, Fangli, Wu, Lingfei, Thai, KP, Hsu, Carol, Wang, Wei, Tong, Richard
Automatic analysis of teacher and student interactions could be very important to improve the quality of teaching and student engagement. However, despite some recent progress in utilizing multimodal data for teaching and learning analytics, a thorough analysis of a rich multimodal dataset coming for a complex real learning environment has yet to be done. To bridge this gap, we present a large-scale MUlti-modal Teaching and Learning Analytics (MUTLA) dataset. This dataset includes time-synchronized multimodal data records of students (learning logs, videos, EEG brainwaves) as they work in various subjects from Squirrel AI Learning System (SAIL) to solve problems of varying difficulty levels. The dataset resources include user records from the learner records store of SAIL, brainwave data collected by EEG headset devices, and video data captured by web cameras while students worked in the SAIL products. Our hope is that by analyzing real-world student learning activities, facial expressions, and brainwave patterns, researchers can better predict engagement, which can then be used to improve adaptive learning selection and student learning outcomes. An additional goal is to provide a dataset gathered from the real-world educational activities versus those from controlled lab environments to benefit educational learning community.
- South America > Uruguay > Maldonado > Maldonado (0.05)
- North America > United States > New Jersey (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Beijing > Beijing (0.04)
Cognitive Science in the era of Artificial Intelligence: A roadmap for reverse-engineering the infant language-learner
During their first years of life, infants learn the language(s) of their environment at an amazing speed despite large cross cultural variations in amount and complexity of the available language input. Understanding this simple fact still escapes current cognitive and linguistic theories. Recently, spectacular progress in the engineering science, notably, machine learning and wearable technology, offer the promise of revolutionizing the study of cognitive development. Machine learning offers powerful learning algorithms that can achieve human-like performance on many linguistic tasks. Wearable sensors can capture vast amounts of data, which enable the reconstruction of the sensory experience of infants in their natural environment. The project of 'reverse engineering' language development, i.e., of building an effective system that mimics infant's achievements appears therefore to be within reach. Here, we analyze the conditions under which such a project can contribute to our scientific understanding of early language development. We argue that instead of defining a sub-problem or simplifying the data, computational models should address the full complexity of the learning situation, and take as input the raw sensory signals available to infants. This implies that (1) accessible but privacy-preserving repositories of home data be setup and widely shared, and (2) models be evaluated at different linguistic levels through a benchmark of psycholinguist tests that can be passed by machines and humans alike, (3) linguistically and psychologically plausible learning architectures be scaled up to real data using probabilistic/optimization principles from machine learning. We discuss the feasibility of this approach and present preliminary results.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.28)
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (16 more...)
- Research Report (1.00)
- Overview (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Education (1.00)