vpa
Marvel: Accelerating Safe Online Reinforcement Learning with Finetuned Offline Policy
Chen, Keru, Wei, Honghao, Deng, Zhigang, Lin, Sen
The high costs and risks involved in extensive environment interactions hinder the practical application of current online safe reinforcement learning (RL) methods. While offline safe RL addresses this by learning policies from static datasets, the performance therein is usually limited due to reliance on data quality and challenges with out-of-distribution (OOD) actions. Inspired by recent successes in offline-to-online (O2O) RL, it is crucial to explore whether offline safe RL can be leveraged to facilitate faster and safer online policy learning, a direction that has yet to be fully investigated. To fill this gap, we first demonstrate that naively applying existing O2O algorithms from standard RL would not work well in the safe RL setting due to two unique challenges: \emph{erroneous Q-estimations}, resulted from offline-online objective mismatch and offline cost sparsity, and \emph{Lagrangian mismatch}, resulted from difficulties in aligning Lagrange multipliers between offline and online policies. To address these challenges, we introduce \textbf{Marvel}, a novel framework for O2O safe RL, comprising two key components that work in concert: \emph{Value Pre-Alignment} to align the Q-functions with the underlying truth before online learning, and \emph{Adaptive PID Control} to effectively adjust the Lagrange multipliers during online finetuning. Extensive experiments demonstrate that Marvel significantly outperforms existing baselines in both reward maximization and safety constraint satisfaction. By introducing the first policy-finetuning based framework for O2O safe RL, which is compatible with many offline and online safe RL methods, our work has the great potential to advance the field towards more efficient and practical safe RL solutions.
VPA: Fully Test-Time Visual Prompt Adaptation
Sun, Jiachen, Ibrahim, Mark, Hall, Melissa, Evtimov, Ivan, Mao, Z. Morley, Ferrer, Cristian Canton, Hazirbas, Caner
Textual prompt tuning has demonstrated significant performance improvements in adapting natural language processing models to a variety of downstream tasks by treating hand-engineered prompts as trainable parameters. Inspired by the success of textual prompting, several studies have investigated the efficacy of visual prompt tuning. In this work, we present Visual Prompt Adaptation (VPA), the first framework that generalizes visual prompting with test-time adaptation. VPA introduces a small number of learnable tokens, enabling fully test-time and storage-efficient adaptation without necessitating source-domain information. We examine our VPA design under diverse adaptation settings, encompassing single-image, batched-image, and pseudo-label adaptation. We evaluate VPA on multiple tasks, including out-of-distribution (OOD) generalization, corruption robustness, and domain adaptation. Experimental results reveal that VPA effectively enhances OOD generalization by 3.3% across various models, surpassing previous test-time approaches. Furthermore, we show that VPA improves corruption robustness by 6.5% compared to strong baselines. Finally, we demonstrate that VPA also boosts domain adaptation performance by relatively 5.2%. Our VPA also exhibits marked effectiveness in improving the robustness of zero-shot recognition for vision-language models.
Pretrained Language Models as Visual Planners for Human Assistance
Patel, Dhruvesh, Eghbalzadeh, Hamid, Kamra, Nitin, Iuzzolino, Michael Louis, Jain, Unnat, Desai, Ruta
In our pursuit of advancing multi-modal AI assistants capable of guiding users to achieve complex multi-step goals, we propose the task of "Visual Planning for Assistance (VPA)". Given a succinct natural language goal, e.g., "make a shelf", and a video of the user's progress so far, the aim of VPA is to devise a plan, i.e., a sequence of actions such as "sand shelf", "paint shelf", etc. to realize the specified goal. This requires assessing the user's progress from the (untrimmed) video, and relating it to the requirements of natural language goal, i.e., which actions to select and in what order? Consequently, this requires handling long video history and arbitrarily complex action dependencies. To address these challenges, we decompose VPA into video action segmentation and forecasting. Importantly, we experiment by formulating the forecasting step as a multi-modal sequence modeling problem, allowing us to leverage the strength of pre-trained LMs (as the sequence model). This novel approach, which we call Visual Language Model based Planner (VLaMP), outperforms baselines across a suite of metrics that gauge the quality of the generated plans. Furthermore, through comprehensive ablations, we also isolate the value of each component--language pre-training, visual observations, and goal information. We have open-sourced all the data, model checkpoints, and training code.
ViTAL: Vision-Based Terrain-Aware Locomotion for Legged Robots
Fahmi, Shamel, Barasuol, Victor, Esteban, Domingo, Villarreal, Octavio, Semini, Claudio
This work is on vision-based planning strategies for legged robots that separate locomotion planning into foothold selection and pose adaptation. Current pose adaptation strategies optimize the robot's body pose relative to given footholds. If these footholds are not reached, the robot may end up in a state with no reachable safe footholds. Therefore, we present a Vision-Based Terrain-Aware Locomotion (ViTAL) strategy that consists of novel pose adaptation and foothold selection algorithms. ViTAL introduces a different paradigm in pose adaptation that does not optimize the body pose relative to given footholds, but the body pose that maximizes the chances of the legs in reaching safe footholds. ViTAL plans footholds and poses based on skills that characterize the robot's capabilities and its terrain-awareness. We use the 90 kg HyQ and 140 kg HyQReal quadruped robots to validate ViTAL, and show that they are able to climb various obstacles including stairs, gaps, and rough terrains at different speeds and gaits. We compare ViTAL with a baseline strategy that selects the robot pose based on given selected footholds, and show that ViTAL outperforms the baseline.
Alexa, Can You Hear Me?
By exploring the various facets of gendering at play in the design of VPAs, specifically Alexa, I argue that gendering Alexa as female poses societal harm insofar as she reproduces normative assumptions about the role of women as submissive, inferior, and secondary to men. The prevalence of AI-driven virtual personal assistants (VPAs) is proliferating, with Amazon Echo being one of the most highly sought-after smart speakers globally. However, not until recently has there been much research or attention focused on the gender bias noticeably programmed into this technology, specifically Alexa, intentionally designed, coded, and programmed by men and gendered to be distinctly female. Big Tech's decision to gender VPAs is seen most evident through their assigned female names and their female voices that users find more pleasant to give orders to than a male voice, as seen through witty flirtatious programmed responses. Through these interactions, Alexa performs gender as a feminized and sexualized entity imposed upon her by her Silicon Valley creators, that has the potential to unravel decades of social and political progress, as well as reinstate the gender bias of the past that women strived to eradicate. In the not-so-distant future, TechCrunch forecasts that the use of voice assistants is set to triple over the next few years and estimates there will be ten billion digital voice assistants by 2023, up from the 2.5 billion assistants in use at the end of 2018. This growth is attributed to Amazon Echo being one of the most highly sought-after smart speakers in the world.
E-learning and the challenge of the senses NEO BLOG
Learning online is contrasted with the opportunities a physical classroom environment has to demonstrate concepts using all five senses: for instance the color, smell and touch of a flower, the sliminess of a mollusk, the acrid smell of ammonia. The senses play an integral role in learning โ one can go so far as to say that from an evolutionary standpoint it is their sole function; we learn through experience best, and the more vivid that experience is, the deeper the learning and retention. Developmental psychology literature (both popular and academic) agrees that external stimuli โ particularly in children โ grow neural pathways, and exaggerate and enhance learning. Young children have a surfeit of neuroglial cells, and the credo "use it or lose it" applies โ neural cells and pathways not used in discovery and learning new things eventually degenerate and die. The most prevalent example is the relative ease with which young children can learn new languages, compared with when they get older.
In pursuit of the perfect AI voice
How developers are humanizing their virtual personal assistants. The virtual personal assistant is romanticized in utopian portrayals of the future from The Jetsons to Star Trek. It's the cultured, disembodied voice at humanity's beck and call, eager and willing to do any number of menial tasks. In its early real-world implementations, a virtual receptionist directed customers ('To hear more menu options, press 9โฒ). It wasn't until 2011 that Apple released Siri and the public had its first interactions with a commercially viable, dynamic personal assistant. Since Siri's debut with the release of the iPhone 4S, Apple's massive customer base has only gotten larger; the company estimates that more than 700 million iPhones are currently in use worldwide. Amazon's Alexa and Microsoft's Cortana debuted in 2014; Google Assistant followed in 2016.
In pursuit of the perfect AI voice
The virtual personal assistant is romanticized in utopian portrayals of the future from The Jetsons to Star Trek. It's the cultured, disembodied voice at humanity's beck and call, eager and willing to do any number of menial tasks. In its early real-world implementations, a virtual receptionist directed customers ('To hear more menu options, press 9'). It wasn't until 2011 that Apple released Siri and the public had its first interactions with a commercially viable, dynamic personal assistant. Since Siri's debut with the release of the iPhone 4S, Apple's massive customer base has only gotten larger; the company estimates that more than 700 million iPhones are currently in use worldwide. Amazon's Alexa and Microsoft's Cortana debuted in 2014; Google Assistant followed in 2016. IT research firm Gartner predicts that many touch-required tasks on mobile apps will become voice activated within the next several years.
How AI can connect customers to your brand
A survey last year found that 98 percent of smartphone owners had used their device's artificial intelligence-based virtual personal assistant (VPA). The majority of those surveyed were inhibited about talking to their artificial intelligence (AI)-powered VPAs in public, but that's likely to change as AI becomes more firmly entrenched in everyday life. As AI becomes a part of daily living, brand leaders are realizing the potential the technology has to transform marketing. With AI, marketers can understand customers more completely and connect with them on a deeper, more personal level. This can allow brands to deliver a buying experience that is relevant to the customer.