voilà
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Shi, Yemin, Shu, Yu, Dong, Siwei, Liu, Guangyi, Sesay, Jaward, Li, Jingwen, Hu, Zhiting
A voice AI agent that blends seamlessly into daily life would interact with humans in an autonomous, real-time, and emotionally expressive manner. Rather than merely reacting to commands, it would continuously listen, reason, and respond proactively, fostering fluid, dynamic, and emotionally resonant interactions. We introduce Voila, a family of large voice-language foundation models that make a step towards this vision. Voila moves beyond traditional pipeline systems by adopting a new end-to-end architecture that enables full-duplex, low-latency conversations while preserving rich vocal nuances such as tone, rhythm, and emotion. It achieves a response latency of just 195 milliseconds, surpassing the average human response time. Its hierarchical multi-scale Transformer integrates the reasoning capabilities of large language models (LLMs) with powerful acoustic modeling, enabling natural, persona-aware voice generation -- where users can simply write text instructions to define the speaker's identity, tone, and other characteristics. Moreover, Voila supports over one million pre-built voices and efficient customization of new ones from brief audio samples as short as 10 seconds. Beyond spoken dialogue, Voila is designed as a unified model for a wide range of voice-based applications, including automatic speech recognition (ASR), Text-to-Speech (TTS), and, with minimal adaptation, multilingual speech translation. Voila is fully open-sourced to support open research and accelerate progress toward next-generation human-machine interactions.
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- (2 more...)
Voila raises $6M for its A.I.-powered storefronts for online creators – TechCrunch
Voila, a startup building infrastructure for social commerce, is bringing concepts from China's e-commerce market to the U.S. The company offers an alternative to the "link in bio" solutions used today by creators, like Linktree and Beacons, which direct followers to creators' social profiles, personal websites, and other recommendations. Instead of a link list or landing page, Voila creates A.I.-powered customizable, shoppable storefronts by automatically detecting items in the creators' online content then generating shoppable links. With now over 10,000 creators signed up for the service, Voila is today announcing the close of its $6 million Series A led by Sinnovation Ventures and joined by Fosun Rz Capital. To date, Voila has raised $7.5 million, including from investors SOSV and Artesian. Voila founder Ke Shang first moved from China to the U.S. to attend college.
- Asia > China (0.50)
- North America > United States > California (0.05)
- Europe (0.05)
- Information Technology > Services (0.51)
- Banking & Finance > Capital Markets (0.36)
Voila! Just like that, app turns your photo into a cartoon
DENVER (KDVR) – Another photo app is taking social media by storm. Voila is an app that uses artificial intelligence to turn your photo into different 3D cartoon versions. The app is pretty simple to use. It allows you to select a photo from your photo library or to take one directly from the app. After you choose the photo, it takes only a few seconds of waiting before it turns your picture into a work of art.
VOILA: Visual-Observation-Only Imitation Learning for Autonomous Navigation
Karnan, Haresh, Warnell, Garrett, Xiao, Xuesu, Stone, Peter
While imitation learning for vision based autonomous mobile robot navigation has recently received a great deal of attention in the research community, existing approaches typically require state action demonstrations that were gathered using the deployment platform. However, what if one cannot easily outfit their platform to record these demonstration signals or worse yet the demonstrator does not have access to the platform at all? Is imitation learning for vision based autonomous navigation even possible in such scenarios? In this work, we hypothesize that the answer is yes and that recent ideas from the Imitation from Observation (IfO) literature can be brought to bear such that a robot can learn to navigate using only ego centric video collected by a demonstrator, even in the presence of viewpoint mismatch. To this end, we introduce a new algorithm, Visual Observation only Imitation Learning for Autonomous navigation (VOILA), that can successfully learn navigation policies from a single video demonstration collected from a physically different agent. We evaluate VOILA in the photorealistic AirSim simulator and show that VOILA not only successfully imitates the expert, but that it also learns navigation policies that can generalize to novel environments. Further, we demonstrate the effectiveness of VOILA in a real world setting by showing that it allows a wheeled Jackal robot to successfully imitate a human walking in an environment using a video recorded using a mobile phone camera.
Value of Information Lattice: Exploiting Probabilistic Independence for Effective Feature Subset Acquisition
We address the cost-sensitive feature acquisition problem, where misclassifying an instance is costly but the expected misclassification cost can be reduced by acquiring the values of the missing features. Because acquiring the features is costly as well, the objective is to acquire the right set of features so that the sum of the feature acquisition cost and misclassification cost is minimized. We describe the Value of Information Lattice (VOILA), an optimal and efficient feature subset acquisition framework. Unlike the common practice, which is to acquire features greedily, VOILA can reason with subsets of features. VOILA efficiently searches the space of possible feature subsets by discovering and exploiting conditional independence properties between the features and it reuses probabilistic inference computations to further speed up the process. Through empirical evaluation on five medical datasets, we show that the greedy strategy is often reluctant to acquire features, as it cannot forecast the benefit of acquiring multiple features in combination.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- South America > Paraguay > Asunción > Asunción (0.04)
- (2 more...)