Goto

Collaborating Authors

 goldberg



Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)

Mariya Toneva, Leila Wehbe

Neural Information Processing Systems

Weusebrainimagingrecordings ofsubjectsreading complex natural text to interpret word and sequence embeddings from4 recent NLP models - ELMo, USE, BERT and Transformer-XL. We study how their representations differ across layer depth, contextlength, and attention type.


Dyslexia and the Reading Wars

The New Yorker

Proven methods for teaching the readers who struggle most have been known for decades. Why do we often fail to use them? "There's a window of opportunity to intervene," Mark Seidenberg, a cognitive neuroscientist, said. "You don't want to let that go." In 2024, my niece Caroline received a Ph.D. in gravitational-wave physics. Her research interests include "the impact of model inaccuracies on biases in parameters recovered from gravitational wave data" and "Petrov type, principal null directions, and Killing tensors of slowly rotating black holes in quadratic gravity." I watched a little of her dissertation defense, on Zoom, and was lost as soon as she'd finished introducing herself. She and her husband now live in Italy, where she has a postdoctoral appointment. Caroline's academic achievements seem especially impressive if you know that until third grade she could barely read: to her, words on a page looked like a pulsing mass. She attended a private school in Connecticut, and there was a set time every day when students selected books to read on their own. "I can't remember how long that lasted, but it felt endless," she told me. She hid her disability by turning pages when her classmates did, and by volunteering to draw illustrations during group story-writing projects. One day, she told her grandmother that she could sound out individual letters but when she got to "the end of a row" she couldn't remember what had come before. A psychologist eventually identified her condition as dyslexia. Fluent readers sometimes think of dyslexia as a tendency to put letters in the wrong order or facing the wrong direction, but it's more complicated than that.


Philly's 'transit vigilante' created a real-time bus tracker for his neighbors

Popular Science

Philly's'transit vigilante' created a real-time bus tracker for his neighbors With a sports timer and some clever coding, Max Goldberg built a DIY display that tells South Philly commuters exactly when their next bus will arrive. Breakthroughs, discoveries, and DIY tips sent every weekday. Philadelphia's mass transit system has had a rough go of it lately. The Pennsylvania city's main public transit provider, SEPTA, has been dealing with massive service cuts, including the elimination of entire bus routes. But South Philly resident Max Goldberg is undeterred.


Machine learning methods fail to provide cohesive atheoretical construction of personality traits from semantic embeddings

Bouguettaya, Ayoub, Stuart, Elizabeth M.

arXiv.org Artificial Intelligence

Here, we test this hypothesis using novel machine learning methods to create a bottom-up, atheoretical model of personality from the same trait-descriptive adjective list that led to the dominant, contemporary model of personality (the Big Five). We then compare the descriptive utility of this machine learning method (resulting in lexical clusters) by comparing it to the established Big Five personality model in how well these describe conversations online (on Reddit forums). Our analysis of 1 million online comments shows that the Big Five model provides a much more powerful and interpretable description of these communities and the differences between them. Specifically, the dimensions of Agreeableness, Conscientiousness, and Neuroticism effectively distinguish Reddit communities. In contrast, our lexical clusters do not provide meaningful distinctions and fail to describe the spread. Validation against the International Personality Item Pool confirmed the Big Five model's superior psychometric coherence, and our machine learning methods notably failed to recover the trait of Extraversion. These results affirm the robustness of the Big Five, while also showing that the semantic structure of personality is likely depending on social context. Our findings suggest that while machine learning can help with understanding and explaining human behavior, especially by checking ecological validity of existing theories, machine learning methods may not be able to replace established psychological theories.


Appendix: A Dual-Stream Neural Network Explains the Functional Segregation of Dorsal and Ventral Visual Pathways in Human Brains Minkyu Choi

Neural Information Processing Systems

Fig. S1 displays the full set of region labels, corresponding to Regions including significant voxels from Fig.3(a) in the main text are As detailed in Section 3.1 of the main text, our model underwent a three-stage training process. After this stage, we conducted a fine-tuning process using the learned fixations from the WhereCNN. In this stage, the WhereCNN, after the pre-training in Stage 1, was incorporated to guide the WhatCNN's fixations. The model samples fixations from the predicted saliency maps from WhereCNN. As indicated in Section 3.1 of the main text, we utilized For All Stages All training stages were conducted using four NVIDIA A40 GPUs. Figure S2: Process of determining the next fixation point given the current fixation.


Evaluating CxG Generalisation in LLMs via Construction-Based NLI Fine Tuning

Mackintosh, Tom, Madabushi, Harish Tayyar, Bonial, Claire

arXiv.org Artificial Intelligence

We probe large language models' ability to learn deep form-meaning mappings as defined by construction grammars. We introduce the ConTest-NLI benchmark of 80k sentences covering eight English constructions from highly lexicalized to highly schematic. Our pipeline generates diverse synthetic NLI triples via templating and the application of a model-in-the-loop filter. This provides aspects of human validation to ensure challenge and label reliability. Zero-shot tests on leading LLMs reveal a 24% drop in accuracy between naturalistic (88%) and adversarial data (64%), with schematic patterns proving hardest. Fine-tuning on a subset of ConTest-NLI yields up to 9% improvement, yet our results highlight persistent abstraction gaps in current LLMs and offer a scalable framework for evaluating construction-informed learning.


BEFT: Bias-Efficient Fine-Tuning of Language Models

Huang, Baichuan, Balashankar, Ananth, Aminifar, Amir

arXiv.org Artificial Intelligence

Bias-only fine-tuning has the potential for unprecedented parameter efficiency. However, the link between fine-tuning different bias terms (i.e., bias terms in the query, key, or value projections) and downstream performance remains unclear. The existing approaches, e.g., based on the magnitude of bias change or empirical Fisher information, provide limited guidance for selecting the particular bias term for effective fine-tuning. In this paper, we propose an approach for selecting the bias term to be fine-tuned, forming the foundation of our bias-efficient fine-tuning (BEFT). We extensively evaluate our bias-efficient approach against other bias-selection approaches, across a wide range of large language models (LLMs) spanning encoder-only and decoder-only architectures from 110M to 6.7B parameters. Our results demonstrate the effectiveness and superiority of our bias-efficient approach on diverse downstream tasks, including classification, multiple-choice, and generation tasks.


Meaning-infused grammar: Gradient Acceptability Shapes the Geometric Representations of Constructions in LLMs

Rakshit, Supantho, Goldberg, Adele

arXiv.org Artificial Intelligence

The usage-based constructionist (UCx) approach to language posits that language comprises a network of learned form-meaning pairings (constructions) whose use is largely determined by their meanings or functions, requiring them to be graded and probabilistic. This study investigates whether the internal representations in Large Language Models (LLMs) reflect the proposed function-infused gradience. We analyze representations of the English Double Object (DO) and Prepositional Object (PO) constructions in Pythia-$1.4$B, using a dataset of $5000$ sentence pairs systematically varied by human-rated preference strength for DO or PO. Geometric analyses show that the separability between the two constructions' representations, as measured by energy distance or Jensen-Shannon divergence, is systematically modulated by gradient preference strength, which depends on lexical and functional properties of sentences. That is, more prototypical exemplars of each construction occupy more distinct regions in activation space, compared to sentences that could have equally well have occured in either construction. These results provide evidence that LLMs learn rich, meaning-infused, graded representations of constructions and offer support for geometric measures for representations in LLMs.


Robo-DM: Data Management For Large Robot Datasets

Chen, Kaiyuan, Fu, Letian, Huang, David, Zhang, Yanxiang, Chen, Lawrence Yunliang, Huang, Huang, Hari, Kush, Balakrishna, Ashwin, Xiao, Ted, Sanketi, Pannag R, Kubiatowicz, John, Goldberg, Ken

arXiv.org Artificial Intelligence

Recent results suggest that very large datasets of teleoperated robot demonstrations can be used to train transformer-based models that have the potential to generalize to new scenes, robots, and tasks. However, curating, distributing, and loading large datasets of robot trajectories, which typically consist of video, textual, and numerical modalities - including streams from multiple cameras - remains challenging. We propose Robo-DM, an efficient open-source cloud-based data management toolkit for collecting, sharing, and learning with robot data. With Robo-DM, robot datasets are stored in a self-contained format with Extensible Binary Meta Language (EBML). Robo-DM can significantly reduce the size of robot trajectory data, transfer costs, and data load time during training. Compared to the RLDS format used in OXE datasets, Robo-DM's compression saves space by up to 70x (lossy) and 3.5x (lossless). Robo-DM also accelerates data retrieval by load-balancing video decoding with memory-mapped decoding caches. Compared to LeRobot, a framework that also uses lossy video compression, Robo-DM is up to 50x faster when decoding sequentially. We physically evaluate a model trained by Robo-DM with lossy compression, a pick-and-place task, and In-Context Robot Transformer. Robo-DM uses 75x compression of the original dataset and does not suffer reduction in downstream task accuracy.