AITopics

2412.04426

Genre:

Research Report (1.00)
Instructional Material > Online (0.61)

Industry:

Education > Educational Setting > Online (0.49)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceSep-26-2023

VPA: Fully Test-Time Visual Prompt Adaptation

Sun, Jiachen, Ibrahim, Mark, Hall, Melissa, Evtimov, Ivan, Mao, Z. Morley, Ferrer, Cristian Canton, Hazirbas, Caner

Textual prompt tuning has demonstrated significant performance improvements in adapting natural language processing models to a variety of downstream tasks by treating hand-engineered prompts as trainable parameters. Inspired by the success of textual prompting, several studies have investigated the efficacy of visual prompt tuning. In this work, we present Visual Prompt Adaptation (VPA), the first framework that generalizes visual prompting with test-time adaptation. VPA introduces a small number of learnable tokens, enabling fully test-time and storage-efficient adaptation without necessitating source-domain information. We examine our VPA design under diverse adaptation settings, encompassing single-image, batched-image, and pseudo-label adaptation. We evaluate VPA on multiple tasks, including out-of-distribution (OOD) generalization, corruption robustness, and domain adaptation. Experimental results reveal that VPA effectively enhances OOD generalization by 3.3% across various models, surpassing previous test-time approaches. Furthermore, we show that VPA improves corruption robustness by 6.5% compared to strong baselines. Finally, we demonstrate that VPA also boosts domain adaptation performance by relatively 5.2%. Our VPA also exhibits marked effectiveness in improving the robustness of zero-shot recognition for vision-language models.

test-time visual prompt adaptation, vpa

2309.15251

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Patel, Dhruvesh, Eghbalzadeh, Hamid, Kamra, Nitin, Iuzzolino, Michael Louis, Jain, Unnat, Desai, Ruta

Pretrained Language Models as Visual Planners for Human Assistance

arXiv.org Artificial IntelligenceAug-26-2023

In our pursuit of advancing multi-modal AI assistants capable of guiding users to achieve complex multi-step goals, we propose the task of "Visual Planning for Assistance (VPA)". Given a succinct natural language goal, e.g., "make a shelf", and a video of the user's progress so far, the aim of VPA is to devise a plan, i.e., a sequence of actions such as "sand shelf", "paint shelf", etc. to realize the specified goal. This requires assessing the user's progress from the (untrimmed) video, and relating it to the requirements of natural language goal, i.e., which actions to select and in what order? Consequently, this requires handling long video history and arbitrarily complex action dependencies. To address these challenges, we decompose VPA into video action segmentation and forecasting. Importantly, we experiment by formulating the forecasting step as a multi-modal sequence modeling problem, allowing us to leverage the strength of pre-trained LMs (as the sequence model). This novel approach, which we call Visual Language Model based Planner (VLaMP), outperforms baselines across a suite of metrics that gauge the quality of the generated plans. Furthermore, through comprehensive ablations, we also isolate the value of each component--language pre-training, visual observations, and goal information. We have open-sourced all the data, model checkpoints, and training code.

baseline, history, vlamp, (15 more...)

2304.09179

Genre:

Workflow (1.00)
Overview (0.66)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Fahmi, Shamel, Barasuol, Victor, Esteban, Domingo, Villarreal, Octavio, Semini, Claudio

ViTAL: Vision-Based Terrain-Aware Locomotion for Legged Robots

arXiv.org Artificial IntelligenceDec-2-2022

This work is on vision-based planning strategies for legged robots that separate locomotion planning into foothold selection and pose adaptation. Current pose adaptation strategies optimize the robot's body pose relative to given footholds. If these footholds are not reached, the robot may end up in a state with no reachable safe footholds. Therefore, we present a Vision-Based Terrain-Aware Locomotion (ViTAL) strategy that consists of novel pose adaptation and foothold selection algorithms. ViTAL introduces a different paradigm in pose adaptation that does not optimize the body pose relative to given footholds, but the body pose that maximizes the chances of the legs in reaching safe footholds. ViTAL plans footholds and poses based on skills that characterize the robot's capabilities and its terrain-awareness. We use the 90 kg HyQ and 140 kg HyQReal quadruped robots to validate ViTAL, and show that they are able to climb various obstacles including stairs, gaps, and rough terrains at different speeds and gaits. We compare ViTAL with a baseline strategy that selects the robot pose based on given selected footholds, and show that ViTAL outperforms the baseline.

artificial intelligence, foothold, machine learning, (19 more...)

doi: 10.1109/TRO.2022.3222958

2212.01246

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Spain > Galicia > Madrid (0.04)
Europe > Italy > Liguria > Genoa (0.04)
(25 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

#artificialintelligenceAug-16-2021, 01:35:12 GMT

Alexa, Can You Hear Me?

By exploring the various facets of gendering at play in the design of VPAs, specifically Alexa, I argue that gendering Alexa as female poses societal harm insofar as she reproduces normative assumptions about the role of women as submissive, inferior, and secondary to men. The prevalence of AI-driven virtual personal assistants (VPAs) is proliferating, with Amazon Echo being one of the most highly sought-after smart speakers globally. However, not until recently has there been much research or attention focused on the gender bias noticeably programmed into this technology, specifically Alexa, intentionally designed, coded, and programmed by men and gendered to be distinctly female. Big Tech's decision to gender VPAs is seen most evident through their assigned female names and their female voices that users find more pleasant to give orders to than a male voice, as seen through witty flirtatious programmed responses. Through these interactions, Alexa performs gender as a feminized and sexualized entity imposed upon her by her Silicon Valley creators, that has the potential to unravel decades of social and political progress, as well as reinstate the gender bias of the past that women strived to eradicate. In the not-so-distant future, TechCrunch forecasts that the use of voice assistants is set to triple over the next few years and estimates there will be ten billion digital voice assistants by 2023, up from the 2.5 billion assistants in use at the end of 2018. This growth is attributed to Amazon Echo being one of the most highly sought-after smart speakers in the world.

alexa, amazon, voice assistant, (14 more...)

Country:

North America > United States > California (0.49)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Information Technology > Services (0.69)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

#artificialintelligenceFeb-16-2019, 12:11:34 GMT

E-learning and the challenge of the senses NEO BLOG

Learning online is contrasted with the opportunities a physical classroom environment has to demonstrate concepts using all five senses: for instance the color, smell and touch of a flower, the sliminess of a mollusk, the acrid smell of ammonia. The senses play an integral role in learning – one can go so far as to say that from an evolutionary standpoint it is their sole function; we learn through experience best, and the more vivid that experience is, the deeper the learning and retention. Developmental psychology literature (both popular and academic) agrees that external stimuli – particularly in children – grow neural pathways, and exaggerate and enhance learning. Young children have a surfeit of neuroglial cells, and the credo "use it or lose it" applies – neural cells and pathways not used in discovery and learning new things eventually degenerate and die. The most prevalent example is the relative ease with which young children can learn new languages, compared with when they get older.

sense neo blog, sensory immersion, student, (9 more...)

Industry:

Education > Educational Setting > Online (0.55)
Education > Educational Technology > Educational Software > Computer Based Training (0.42)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.55)
Information Technology > Artificial Intelligence (0.37)

#artificialintelligenceApr-22-2018, 15:46:05 GMT

In pursuit of the perfect AI voice

How developers are humanizing their virtual personal assistants. The virtual personal assistant is romanticized in utopian portrayals of the future from The Jetsons to Star Trek. It's the cultured, disembodied voice at humanity's beck and call, eager and willing to do any number of menial tasks. In its early real-world implementations, a virtual receptionist directed customers ('To hear more menu options, press 9′). It wasn't until 2011 that Apple released Siri and the public had its first interactions with a commercially viable, dynamic personal assistant. Since Siri's debut with the release of the iPhone 4S, Apple's massive customer base has only gotten larger; the company estimates that more than 700 million iPhones are currently in use worldwide. Amazon's Alexa and Microsoft's Cortana debuted in 2014; Google Assistant followed in 2016.

artificial intelligence, information, kleinberger, (17 more...)

Country: North America > United States > Alabama (0.04)

Industry:

Health & Medicine > Therapeutic Area (0.47)
Media > Television (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)

EngadgetApr-9-2018, 17:01:12 GMT

In pursuit of the perfect AI voice

The virtual personal assistant is romanticized in utopian portrayals of the future from The Jetsons to Star Trek. It's the cultured, disembodied voice at humanity's beck and call, eager and willing to do any number of menial tasks. In its early real-world implementations, a virtual receptionist directed customers ('To hear more menu options, press 9'). It wasn't until 2011 that Apple released Siri and the public had its first interactions with a commercially viable, dynamic personal assistant. Since Siri's debut with the release of the iPhone 4S, Apple's massive customer base has only gotten larger; the company estimates that more than 700 million iPhones are currently in use worldwide. Amazon's Alexa and Microsoft's Cortana debuted in 2014; Google Assistant followed in 2016. IT research firm Gartner predicts that many touch-required tasks on mobile apps will become voice activated within the next several years.

information, kleinberger, siri, (16 more...)

Engadget

Country: North America > United States > Alabama (0.04)

Industry:

Health & Medicine > Therapeutic Area (0.47)
Media > Television (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)

Holistic Conversational Assistants

Ortiz, Charles L. (Nuance)

AI MagazineMar-27-2018

This column describes work being done at Nuance Communication in developing virtual personal assistants (VPAs) that can engage in extended task center dialogues and the involve the coordination of many complex modules, along with conversational and collaborative support to such VPAs.

artificial intelligence, information, natural language, (17 more...)

AI Magazine

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > United States > Pennsylvania (0.05)
North America > United States > Michigan (0.05)
North America > United States > Massachusetts (0.05)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

#artificialintelligenceAug-19-2017, 00:44:07 GMT

How AI can connect customers to your brand

A survey last year found that 98 percent of smartphone owners had used their device's artificial intelligence-based virtual personal assistant (VPA). The majority of those surveyed were inhibited about talking to their artificial intelligence (AI)-powered VPAs in public, but that's likely to change as AI becomes more firmly entrenched in everyday life. As AI becomes a part of daily living, brand leaders are realizing the potential the technology has to transform marketing. With AI, marketers can understand customers more completely and connect with them on a deeper, more personal level. This can allow brands to deliver a buying experience that is relevant to the customer.

buying experience, connect customer, customer, (9 more...)

Industry: Retail (0.98)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)