Goto

Collaborating Authors

 goldberg



Philly's 'transit vigilante' created a real-time bus tracker for his neighbors

Popular Science

Philly's'transit vigilante' created a real-time bus tracker for his neighbors With a sports timer and some clever coding, Max Goldberg built a DIY display that tells South Philly commuters exactly when their next bus will arrive. Breakthroughs, discoveries, and DIY tips sent every weekday. Philadelphia's mass transit system has had a rough go of it lately. The Pennsylvania city's main public transit provider, SEPTA, has been dealing with massive service cuts, including the elimination of entire bus routes. But South Philly resident Max Goldberg is undeterred.


Machine learning methods fail to provide cohesive atheoretical construction of personality traits from semantic embeddings

Bouguettaya, Ayoub, Stuart, Elizabeth M.

arXiv.org Artificial Intelligence

Here, we test this hypothesis using novel machine learning methods to create a bottom-up, atheoretical model of personality from the same trait-descriptive adjective list that led to the dominant, contemporary model of personality (the Big Five). We then compare the descriptive utility of this machine learning method (resulting in lexical clusters) by comparing it to the established Big Five personality model in how well these describe conversations online (on Reddit forums). Our analysis of 1 million online comments shows that the Big Five model provides a much more powerful and interpretable description of these communities and the differences between them. Specifically, the dimensions of Agreeableness, Conscientiousness, and Neuroticism effectively distinguish Reddit communities. In contrast, our lexical clusters do not provide meaningful distinctions and fail to describe the spread. Validation against the International Personality Item Pool confirmed the Big Five model's superior psychometric coherence, and our machine learning methods notably failed to recover the trait of Extraversion. These results affirm the robustness of the Big Five, while also showing that the semantic structure of personality is likely depending on social context. Our findings suggest that while machine learning can help with understanding and explaining human behavior, especially by checking ecological validity of existing theories, machine learning methods may not be able to replace established psychological theories.


Appendix: A Dual-Stream Neural Network Explains the Functional Segregation of Dorsal and Ventral Visual Pathways in Human Brains Minkyu Choi

Neural Information Processing Systems

Fig. S1 displays the full set of region labels, corresponding to Regions including significant voxels from Fig.3(a) in the main text are As detailed in Section 3.1 of the main text, our model underwent a three-stage training process. After this stage, we conducted a fine-tuning process using the learned fixations from the WhereCNN. In this stage, the WhereCNN, after the pre-training in Stage 1, was incorporated to guide the WhatCNN's fixations. The model samples fixations from the predicted saliency maps from WhereCNN. As indicated in Section 3.1 of the main text, we utilized For All Stages All training stages were conducted using four NVIDIA A40 GPUs. Figure S2: Process of determining the next fixation point given the current fixation.


Evaluating CxG Generalisation in LLMs via Construction-Based NLI Fine Tuning

Mackintosh, Tom, Madabushi, Harish Tayyar, Bonial, Claire

arXiv.org Artificial Intelligence

We probe large language models' ability to learn deep form-meaning mappings as defined by construction grammars. We introduce the ConTest-NLI benchmark of 80k sentences covering eight English constructions from highly lexicalized to highly schematic. Our pipeline generates diverse synthetic NLI triples via templating and the application of a model-in-the-loop filter. This provides aspects of human validation to ensure challenge and label reliability. Zero-shot tests on leading LLMs reveal a 24% drop in accuracy between naturalistic (88%) and adversarial data (64%), with schematic patterns proving hardest. Fine-tuning on a subset of ConTest-NLI yields up to 9% improvement, yet our results highlight persistent abstraction gaps in current LLMs and offer a scalable framework for evaluating construction-informed learning.


BEFT: Bias-Efficient Fine-Tuning of Language Models

Huang, Baichuan, Balashankar, Ananth, Aminifar, Amir

arXiv.org Artificial Intelligence

Bias-only fine-tuning has the potential for unprecedented parameter efficiency. However, the link between fine-tuning different bias terms (i.e., bias terms in the query, key, or value projections) and downstream performance remains unclear. The existing approaches, e.g., based on the magnitude of bias change or empirical Fisher information, provide limited guidance for selecting the particular bias term for effective fine-tuning. In this paper, we propose an approach for selecting the bias term to be fine-tuned, forming the foundation of our bias-efficient fine-tuning (BEFT). We extensively evaluate our bias-efficient approach against other bias-selection approaches, across a wide range of large language models (LLMs) spanning encoder-only and decoder-only architectures from 110M to 6.7B parameters. Our results demonstrate the effectiveness and superiority of our bias-efficient approach on diverse downstream tasks, including classification, multiple-choice, and generation tasks.


Meaning-infused grammar: Gradient Acceptability Shapes the Geometric Representations of Constructions in LLMs

Rakshit, Supantho, Goldberg, Adele

arXiv.org Artificial Intelligence

The usage-based constructionist (UCx) approach to language posits that language comprises a network of learned form-meaning pairings (constructions) whose use is largely determined by their meanings or functions, requiring them to be graded and probabilistic. This study investigates whether the internal representations in Large Language Models (LLMs) reflect the proposed function-infused gradience. We analyze representations of the English Double Object (DO) and Prepositional Object (PO) constructions in Pythia-$1.4$B, using a dataset of $5000$ sentence pairs systematically varied by human-rated preference strength for DO or PO. Geometric analyses show that the separability between the two constructions' representations, as measured by energy distance or Jensen-Shannon divergence, is systematically modulated by gradient preference strength, which depends on lexical and functional properties of sentences. That is, more prototypical exemplars of each construction occupy more distinct regions in activation space, compared to sentences that could have equally well have occured in either construction. These results provide evidence that LLMs learn rich, meaning-infused, graded representations of constructions and offer support for geometric measures for representations in LLMs.


Robo-DM: Data Management For Large Robot Datasets

Chen, Kaiyuan, Fu, Letian, Huang, David, Zhang, Yanxiang, Chen, Lawrence Yunliang, Huang, Huang, Hari, Kush, Balakrishna, Ashwin, Xiao, Ted, Sanketi, Pannag R, Kubiatowicz, John, Goldberg, Ken

arXiv.org Artificial Intelligence

Recent results suggest that very large datasets of teleoperated robot demonstrations can be used to train transformer-based models that have the potential to generalize to new scenes, robots, and tasks. However, curating, distributing, and loading large datasets of robot trajectories, which typically consist of video, textual, and numerical modalities - including streams from multiple cameras - remains challenging. We propose Robo-DM, an efficient open-source cloud-based data management toolkit for collecting, sharing, and learning with robot data. With Robo-DM, robot datasets are stored in a self-contained format with Extensible Binary Meta Language (EBML). Robo-DM can significantly reduce the size of robot trajectory data, transfer costs, and data load time during training. Compared to the RLDS format used in OXE datasets, Robo-DM's compression saves space by up to 70x (lossy) and 3.5x (lossless). Robo-DM also accelerates data retrieval by load-balancing video decoding with memory-mapped decoding caches. Compared to LeRobot, a framework that also uses lossy video compression, Robo-DM is up to 50x faster when decoding sequentially. We physically evaluate a model trained by Robo-DM with lossy compression, a pick-and-place task, and In-Context Robot Transformer. Robo-DM uses 75x compression of the original dataset and does not suffer reduction in downstream task accuracy.


Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware

Yu, Justin, Fu, Letian, Huang, Huang, El-Refai, Karim, Ambrus, Rares Andrei, Cheng, Richard, Irshad, Muhammad Zubair, Goldberg, Ken

arXiv.org Artificial Intelligence

Scaling robot learning requires vast and diverse datasets. Yet the prevailing data collection paradigm-human teleoperation-remains costly and constrained by manual effort and physical robot access. We introduce Real2Render2Real (R2R2R), a novel approach for generating robot training data without relying on object dynamics simulation or teleoperation of robot hardware. The input is a smartphone-captured scan of one or more objects and a single video of a human demonstration. R2R2R renders thousands of high visual fidelity robot-agnostic demonstrations by reconstructing detailed 3D object geometry and appearance, and tracking 6-DoF object motion. R2R2R uses 3D Gaussian Splatting (3DGS) to enable flexible asset generation and trajectory synthesis for both rigid and articulated objects, converting these representations to meshes to maintain compatibility with scalable rendering engines like IsaacLab but with collision modeling off. Robot demonstration data generated by R2R2R integrates directly with models that operate on robot proprioceptive states and image observations, such as vision-language-action models (VLA) and imitation learning policies. Physical experiments suggest that models trained on R2R2R data from a single human demonstration can match the performance of models trained on 150 human teleoperation demonstrations. Project page: https://real2render2real.com


For GPT-4 as with Humans: Information Structure Predicts Acceptability of Long-Distance Dependencies

Cuneo, Nicole, Graves, Eleanor, Rakshit, Supantho, Goldberg, Adele E.

arXiv.org Artificial Intelligence

It remains debated how well any LM understands natural language or generates reliable metalinguistic judgments. Moreover, relatively little work has demonstrated that LMs can represent and respect subtle relationships between form and function proposed by linguists. We here focus on a particular such relationship established in recent work: English speakers' judgments about the information structure of canonical sentences predicts independently collected acceptability ratings on corresponding 'long distance dependency' [LDD] constructions, across a wide array of base constructions and multiple types of LDDs. To determine whether any LM captures this relationship, we probe GPT-4 on the same tasks used with humans and new extensions.Results reveal reliable metalinguistic skill on the information structure and acceptability tasks, replicating a striking interaction between the two, despite the zero-shot, explicit nature of the tasks, and little to no chance of contamination [Studies 1a, 1b]. Study 2 manipulates the information structure of base sentences and confirms a causal relationship: increasing the prominence of a constituent in a context sentence increases the subsequent acceptability ratings on an LDD construction. The findings suggest a tight relationship between natural and GPT-4 generated English, and between information structure and syntax, which begs for further exploration.