Monfort, Mathew
Precise Model Benchmarking with Only a Few Observations
Fogliato, Riccardo, Patil, Pratik, Akpinar, Nil-Jana, Monfort, Mathew
How can we precisely estimate a large language model's (LLM) accuracy on questions belonging to a specific topic within a larger question-answering dataset? The standard direct estimator, which averages the model's accuracy on the questions in each subgroup, may exhibit high variance for subgroups (topics) with small sample sizes. Synthetic regression modeling, which leverages the model's accuracy on questions about other topics, may yield biased estimates that are too unreliable for large subgroups. We prescribe a simple yet effective solution: an empirical Bayes (EB) estimator that balances direct and regression estimates for each subgroup separately, improving the precision of subgroup-level estimates of model performance. Our experiments on multiple datasets show that this approach consistently provides more precise estimates of the LLM performance compared to the direct and regression approaches, achieving substantial reductions in the mean squared error. Confidence intervals for EB estimates also have near-nominal coverage and are narrower compared to those for the direct estimator. Additional experiments on tabular and vision data validate the benefits of this EB approach.
Moments in Time Dataset: one million videos for event understanding
Monfort, Mathew, Andonian, Alex, Zhou, Bolei, Ramakrishnan, Kandan, Bargal, Sarah Adel, Yan, Tom, Brown, Lisa, Fan, Quanfu, Gutfruend, Dan, Vondrick, Carl, Oliva, Aude
We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds. Modeling the spatial-audio-temporal dynamics even for actions occurring in 3 second videos poses many challenges: meaningful events do not include only people, but also objects, animals, and natural phenomena; visual and auditory events can be symmetrical in time ("opening" is "closing" in reverse), and either transient or sustained. We describe the annotation process of our dataset (each video is tagged with one action or activity label among 339 different classes), analyze its scale and diversity in comparison to other large-scale video datasets for action recognition, and report results of several baseline models addressing separately, and jointly, three modalities: spatial, temporal and auditory. The Moments in Time dataset, designed to have a large coverage and diversity of events in both visual and auditory modalities, can serve as a new challenge to develop models that scale to the level of complexity and abstract reasoning that a human processes on a daily basis.
Softstar: Heuristic-Guided Probabilistic Inference
Monfort, Mathew, Lake, Brenden M., Lake, Brenden M., Ziebart, Brian, Lucey, Patrick, Tenenbaum, Josh
Recent machine learning methods for sequential behavior prediction estimate the motives of behavior rather than the behavior itself. This higher-level abstraction improves generalization in different prediction settings, but computing predictions often becomes intractable in large decision spaces. We propose the Softstar algorithm, a softened heuristic-guided search technique for the maximum entropy inverse optimal control model of sequential behavior. This approach supports probabilistic search with bounded approximation error at a significantly reduced computational cost when compared to sampling based methods. We present the algorithm, analyze approximation guarantees, and compare performance with simulation-based inference on two distinct complex decision tasks.
Intent Prediction and Trajectory Forecasting via Predictive Inverse Linear-Quadratic Regulation
Monfort, Mathew (University of Illinois at Chicago) | Liu, Anqi (University of Illinois at Chicago) | Ziebart, Brian (University of Illinois at Chicago)
To facilitate interaction with people, robots must not only recognize current actions, but also infer a person's intentions and future behavior. Recent advances in depth camera technology have significantly improved human motion tracking. However, the inherent high dimensionality of interacting with the physical world makes efficiently forecasting human intention and future behavior a challenging task. Predictive methods that estimate uncertainty are therefore critical for supporting appropriate robotic responses to the many ambiguities posed within the human-robot interaction setting. We address these two challenges, high dimensionality and uncertainty, by employing predictive inverse optimal control methods to estimate a probabilistic model of human motion trajectories. Our inverse optimal control formulation estimates quadratic cost functions that best rationalize observed trajectories framed as solutions to linear-quadratic regularization problems. The formulation calibrates its uncertainty from observed motion trajectories, and is efficient in high-dimensional state spaces with linear dynamics. We demonstrate its effectiveness on a task of anticipating the future trajectories, target locations and activity intentions of hand motions.