todorov
- Asia > Middle East > Jordan (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)
Partially Observable Reference Policy Programming: Solving POMDPs Sans Numerical Optimisation
Kim, Edward, Kurniawati, Hanna
This paper proposes Partially Observable Reference Policy Programming, a novel anytime online approximate POMDP solver which samples meaningful future histories very deeply while simultaneously forcing a gradual policy update. We provide theoretical guarantees for the algorithm's underlying scheme which say that the performance loss is bounded by the average of the sampling approximation errors rather than the usual maximum, a crucial requirement given the sampling sparsity of online planning. Empirical evaluations on two large-scale problems with dynamically evolving environments -- including a helicopter emergency scenario in the Corsica region requiring approximately 150 planning steps -- corroborate the theoretical results and indicate that our solver considerably outperforms current online benchmarks.
- Oceania > Australia (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Information Technology > Artificial Intelligence > Robots (0.95)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.73)
The Art of Audience Engagement: LLM-Based Thin-Slicing of Scientific Talks
Schmälzle, Ralf, Lim, Sue, Du, Yuetong, Bente, Gary
This paper examines the thin-slicing approach - the ability to make accurate judgments based on minimal information - in the context of scientific presentations. Drawing on research from nonverbal communication and personality psychology, we show that brief excerpts (thin slices) reliably predict overall presentation quality. Using a novel corpus of over one hundred real-life science talks, we employ Large Language Models (LLMs) to evaluate transcripts of full presentations and their thin slices. By correlating LLM-based evaluations of short excerpts with full-talk assessments, we determine how much information is needed for accurate predictions. Our results demonstrate that LLM-based evaluations align closely with human ratings, proving their validity, reliability, and efficiency. Critically, even very short excerpts (less than 10 percent of a talk) strongly predict overall evaluations. This suggests that the first moments of a presentation convey relevant information that is used in quality evaluations and can shape lasting impressions. The findings are robust across different LLMs and prompting strategies. This work extends thin-slicing research to public speaking and connects theories of impression formation to LLMs and current research on AI communication. We discuss implications for communication and social cognition research on message reception. Lastly, we suggest an LLM-based thin-slicing framework as a scalable feedback tool to enhance human communication.
- North America > United States (0.28)
- Europe (0.28)
Towards Generalization and Simplicity in Continuous Control
Aravind Rajeswaran, Kendall Lowrey, Emanuel V. Todorov, Sham M. Kakade
This work shows that policies with simple linear and RBF parameterizations can be trained to solve a variety of widely studied continuous control tasks, including the gym-v1 benchmarks. The performance of these trained policies are competitive with state of the art results, obtained with more elaborate parameterizations such as fully connected neural networks. Furthermore, the standard training and testing scenarios for these tasks are shown to be very limited and prone to over-fitting, thus giving rise to only trajectory-centric policies. Training with a diverse initial state distribution induces more global policies with better generalization. This allows for interactive control scenarios where the system recovers from large on-line perturbations; as shown in the supplementary video.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)
Bayesian Inverse Reinforcement Learning for Collective Animal Movement
Schafer, Toryn L. J., Wikle, Christopher K., Hooten, Mevin B.
Agent-based methods allow for defining simple rules that generate complex group behaviors. The governing rules of such models are typically set a priori and parameters are tuned from observed behavior trajectories. Instead of making simplifying assumptions across all anticipated scenarios, inverse reinforcement learning provides inference on the short-term (local) rules governing long term behavior policies by using properties of a Markov decision process. We use the computationally efficient linearly-solvable Markov decision process to learn the local rules governing collective movement for a simulation of the self propelled-particle (SPP) model and a data application for a captive guppy population. The estimation of the behavioral decision costs is done in a Bayesian framework with basis function smoothing. We recover the true costs in the SPP simulation and find the guppies value collective movement more than targeted movement toward shelter.
- North America > United States > Colorado (0.04)
- Africa > Togo (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
Machines Learn Appearance Bias in Face Recognition
We seek to determine whether state-of-the-art, black box face recognition techniques can learn first-impression appearance bias from human annotations. With FaceNet, a popular face recognition architecture, we train a transfer learning model on human subjects' first impressions of personality traits in other faces. We measure the extent to which this appearance bias is embedded and benchmark learning performance for six different perceived traits. In particular, we find that our model is better at judging a person's dominance based on their face than other traits like trustworthiness or likeability, even for emotionally neutral faces. We also find that our model tends to predict emotions for deliberately manipulated faces with higher accuracy than for randomly generated faces, just like a human subject. Our results lend insight into the manner in which appearance biases may be propagated by standard face recognition models.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations
Rajeswaran, Aravind, Kumar, Vikash, Gupta, Abhishek, Vezzani, Giulia, Schulman, John, Todorov, Emanuel, Levine, Sergey
Multi-fingered dexterous manipulators are crucial for robots to function in human-centric environments, due to their versatility and potential to enable a large variety of contact-rich tasks, such as in-hand manipulation, complex grasping, and tool use. However, this versatility comes at the price of high dimensional observation and action spaces, complex and discontinuous contact patterns, and under-actuation during nonprehensile manipulation. This makes dexterous manipulation with multi-fingered hands a challenging problem. Dexterous manipulation behaviors with multi-fingered hands have previously been obtained using model-based trajectory optimization methods [31], [24]. However, these methods typically rely on accurate dynamics models and state estimates, which are often difficult to obtain for contact rich manipulation tasks, especially in the real world. Reinforcement learning provides a model agnostic approach that circumvents these issues. Indeed, model-free methods have been used for acquiring manipulation skills [52], [13], but so far have been limited to simpler behaviors with 2-3 finger hands or wholearm manipulators, which do not capture the challenges of highdimensional multi-fingered hands.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > Middle East > Jordan (0.04)
Towards Generalization and Simplicity in Continuous Control
Rajeswaran, Aravind, Lowrey, Kendall, Todorov, Emanuel V., Kakade, Sham M.
The remarkable successes of deep learning in speech recognition and computer vision have motivated efforts to adapt similar techniques to other problem domains, including reinforcement learning (RL). Consequently, RL methods have produced rich motor behaviors on simulated robot tasks, with their success largely attributed to the use of multi-layer neural networks. This work is among the first to carefully study what might be responsible for these recent advancements. Our main result calls this emerging narrative into question by showing that much simpler architectures -- based on linear and RBF parameterizations -- achieve comparable performance to state of the art results. We not only study different policy representations with regard to performance measures at hand, but also towards robustness to external perturbations. We again find that the learned neural network policies --- under the standard training scenarios --- are no more robust than linear (or RBF) policies; in fact, all three are remarkably brittle. Finally, we then directly modify the training scenarios in order to favor more robust policies, and we again do not find a compelling case to favor multi-layer architectures. Overall, this study suggests that multi-layer architectures should not be the default choice, unless a side-by-side comparison to simpler architectures shows otherwise. More generally, we hope that these results lead to more interest in carefully studying the architectural choices, and associated trade-offs, for training generalizable and robust policies.
First impressions are always WRONG says Princeton Uni
Your first impressions on meeting a new person are likely to be wrong, according to one leading scientist. The assumptions we make when meeting new people are based largely on their facial expressions and appearance, but this rarely matches up to their personality. And these hang-ups may spoil our chances of finding a life partner or landing the perfect job, according to Professor Alex Todorov, from Princeton University in New Jersey. Faces that look happy, even if they're not smiling, (left) are commonly rated as more trustworthy than faces that appear angry (right). First impressions are likely to be wrong as they are based on shallow assumptions about appearances, according to one leading expert.
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (0.36)
Fast rates for online learning in Linearly Solvable Markov Decision Processes
We study the problem of online learning in a class of Markov decision processes known as linearly solvable MDPs. In the stationary version of this problem, a learner interacts with its environment by directly controlling the state transitions, attempting to balance a fixed state-dependent cost and a certain smooth cost penalizing extreme control inputs. In the current paper, we consider an online setting where the state costs may change arbitrarily between consecutive rounds, and the learner only observes the costs at the end of each respective round. We are interested in constructing algorithms for the learner that guarantee small regret against the best stationary control policy chosen in full knowledge of the cost sequence. Our main result is showing that the smoothness of the control cost enables the simple algorithm of following the leader to achieve a regret of order $\log^2 T$ after $T$ rounds, vastly improving on the best known regret bound of order $T^{3/4}$ for this setting.
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
- Information Technology > Enterprise Applications > Human Resources > Learning Management (0.63)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)