Feldman, Jacob
Reasoning or Simply Next Token Prediction? A Benchmark for Stress-Testing Large Language Models
Wang, Wentian, Kantor, Paul, Feldman, Jacob, Gallos, Lazaros, Wang, Hao
We propose MMLU-SR, a novel dataset designed to measure the true comprehension abilities of Large Language Models (LLMs) by challenging their performance in question-answering tasks with modified terms. We reasoned that an agent that ``truly'' understands a concept can still evaluate it when key terms are replaced by suitably defined alternate terms, and sought to differentiate such comprehension from mere text replacement. In our study, we modified standardized test questions by replacing a key term with a dummy word along with its definition. The key term could be in the context of questions, answers, or both questions and answers. Notwithstanding the high scores achieved by recent popular LLMs on the MMLU leaderboard, we found a substantial reduction in model performance after such replacement, suggesting poor comprehension. This new benchmark provides a rigorous benchmark for testing true model comprehension, and poses a challenge to the broader scientific community.
Socially Cognizant Robotics for a Technology Enhanced Society
Dana, Kristin J., Andrews, Clinton, Bekris, Kostas, Feldman, Jacob, Stone, Matthew, Hemmer, Pernille, Mazzeo, Aaron, Salzman, Hal, Yi, Jingang
Applications of robotics (such as telepresence, transportation, elder-care, remote health care, cleaning, warehouse logistics, and delivery) are bringing significant changes in individuals' lives and are having profound social impact. Despite the envisioned potential of robotics, the goal of ubiquitous robot assistants augmenting quality of life (and quality of work life) has not yet been realized. Key challenges lie in the complexities of four overarching human-centric objectives that such systems must aim for: 1) improving quality of life of people, especially marginalized communities; 2) anticipating and mitigating unintended negative consequences of technological development; 3) enabling robots to adapt to the desires and needs of human counterparts; 4) respecting the need for human autonomy and agency. Pursuing these objectives requires an integrated cohort of technologists, behavioral scientists and social scientists with a shared vision to pursue a deep, multidisciplinary understanding of how robots interact with individuals and society. We introduce a new term, socially cognizant robotics, to describe this multi-faceted interdisciplinary branch of technology. The emerging practitioner, the socially cognizant roboticist, represents the convergence of socially aware technologists, who can develop intelligent devices that adapt to human and social behavior; and technology-aware social scientists and policymakers, who can translate studies of robotics' social effects into actionable and technically-viable principles and policies. A primary element of socially cognizant robotics is a deliberate "invitation to the table" for social scientists, who bring analytical perspectives and methods that are not typically present in robotics. These perspectives cover two levels of human-technology interaction that we view as essential: the human-robot dyad (Section 2) and the robot-society dyad (Section 3). Figure 1 illustrates how these levels might operate in the context of the workplace and everyday life.
Toward a Taxonomy and Computational Models of Abnormalities in Images
Saleh, Babak (Rutgers University) | Elgammal, Ahmed (Rutgers University) | Feldman, Jacob (Rutgers University) | Farhadi, Ali (University of Washington)
The human visual system can spot an abnormal image, and reason about what makes it strange. This task has not received enough attention in computer vision. In this paper we study various types of atypicalities in images in a more comprehensive way than has been done before. We propose a new dataset of abnormal images showing a wide range of atypicalities. We design human subject experiments to discover a coarse taxonomy of the reasons for abnormality. Our experiments reveal three major categories of abnormality: object-centric, scene-centric, and contextual. Based on this taxonomy, we propose a comprehensive computational model that can predict all different types of abnormality in images and outperform prior arts in abnormality recognition.
A Bayesian Approach to Perceptual 3D Object-Part Decomposition Using Skeleton-Based Representations
El-Gaaly, Tarek (Rutgers University) | Froyen, Vicky (Rutgers University) | Elgammal, Ahmed (Rutgers University) | Feldman, Jacob (Rutgers University) | Singh, Manish (Rutgers University)
We present a probabilistic approach to shape decomposition that creates a skeleton-based shape representation of a 3D object while simultaneously decomposing it into constituent parts. Our approach probabilistically combines two prominent threads from the shape literature: skeleton-based (medial axis) representations of shape, and part-based representations of shape, in which shapes are combinations of primitive parts. Our approach recasts skeleton-based shape representation as a mixture estimation problem, allowing us to apply probabilistic estimation techniques to the problem of 3D shape decomposition, extending earlier work on the 2D case. The estimated 3D shape decompositions approximate human shape decomposition judgments. We present a tractable implementation of the framework, which begins by over-segmenting objects at concavities, and then probabilistically merges them to create a distribution over possible decompositions. This results in a hierarchy of decompositions at different structural scales, again closely matching known properties of human shape representation. The probabilistic estimation procedures that arise naturally in the model allow effective prediction of missing parts. We present results on shapes from a standard database illustrating the effectiveness of the approach.
A Bayesian Framework for Figure-Ground Interpretation
Froyen, Vicky, Feldman, Jacob, Singh, Manish
Figure/ground assignment, in which the visual image is divided into nearer (figural) andfarther (ground) surfaces, is an essential step in visual processing, but its underlying computational mechanisms are poorly understood. Figural assignment (often referred to as border ownership) can vary along a contour, suggesting a spatially distributed process whereby local and global cues are combined to yield local estimates of border ownership. In this paper we model figure/ground estimation ina Bayesian belief network, attempting to capture the propagation of border ownership across the image as local cues (contour curvature and T-junctions) interact withmore global cues to yield a figure/ground assignment. Our network includes as a nonlocal factor skeletal (medial axis) structure, under the hypothesis that medial structure "draws" border ownership so that borders are owned by the skeletal hypothesis that best explains them. We also briefly present a psychophysical experimentin which we measured local border ownership along a contour at various distances from an inducing cue (a T-junction).
Categorization Under Complexity: A Unified MDL Account of Human Learning of Regular and Irregular Categories
Fass, David, Feldman, Jacob
We present an account of human concept learning-that is, learning of categories from examples-based on the principle of minimum description length (MDL). In support of this theory, we tested a wide range of two-dimensional concept types, including both regular (simple) and highly irregular (complex) structures, and found the MDL theory to give a good account of subjects' performance. This suggests that the intrinsic complexity ofa concept (that is, its description -length) systematically influences its leamability.
Categorization Under Complexity: A Unified MDL Account of Human Learning of Regular and Irregular Categories
Fass, David, Feldman, Jacob
We present an account of human concept learning-that is, learning of categories from examples-based on the principle of minimum description length(MDL). In support of this theory, we tested a wide range of two-dimensional concept types, including both regular (simple) and highly irregular (complex) structures, and found the MDL theory to give a good account of subjects' performance. This suggests that the intrinsic complexityofa concept (that is, its description -length) systematically influences its leamability.