niña
NinA: Normalizing Flows in Action. Training VLA Models with Normalizing Flows
Tarasov, Denis, Nikulin, Alexander, Zisman, Ilya, Klepach, Albina, Lyubaykin, Nikita, Polubarov, Andrei, Derevyagin, Alexander, Kurenkov, Vladislav
Recent advances in Vision-Language-Action (VLA) models have established a two-component architecture, where a pre-trained Vision-Language Model (VLM) encodes visual observations and task descriptions, and an action decoder maps these representations to continuous actions. Diffusion models have been widely adopted as action decoders due to their ability to model complex, multimodal action distributions. However, they require multiple iterative denoising steps at inference time or downstream techniques to speed up sampling, limiting their practicality in real-world settings where high-frequency control is crucial. In this work, we present NinA (Normalizing Flows in Action), a fast and expressive alternative to diffusion-based decoders for VLAs. NinA replaces the diffusion action decoder with a Normalizing Flow (NF) that enables one-shot sampling through an invertible transformation, significantly reducing inference time. We integrate NinA into the FLOWER VLA architecture and fine-tune on the LIBERO benchmark. Our experiments show that NinA matches the performance of its diffusion-based counterpart under the same training regime, while achieving substantially faster inference. These results suggest that NinA offers a promising path toward efficient, high-frequency VLA control without compromising performance.
ReGraP-LLaVA: Reasoning enabled Graph-based Personalized Large Language and Vision Assistant
Xiang, Yifan, Zhang, Zhenxi, Li, Bin, Weng, Yixuan, Zhou, Shoujun, He, Yangfan, Li, Keqin
Recent advances in personalized MLLMs enable effective capture of user-specific concepts, supporting both recognition of personalized concepts and contextual captioning. However, humans typically explore and reason over relations among objects and individuals, transcending surface-level information to achieve more personalized and contextual understanding. To this end, existing methods may face three main limitations: Their training data lacks multi-object sets in which relations among objects are learnable. Building on the limited training data, their models overlook the relations between different personalized concepts and fail to reason over them. Their experiments mainly focus on a single personalized concept, where evaluations are limited to recognition and captioning tasks. To address the limitations, we present a new dataset named ReGraP, consisting of 120 sets of personalized knowledge. Each set includes images, KGs, and CoT QA pairs derived from the KGs, enabling more structured and sophisticated reasoning pathways. We propose ReGraP-LLaVA, an MLLM trained with the corresponding KGs and CoT QA pairs, where soft and hard graph prompting methods are designed to align KGs within the model's semantic space. We establish the ReGraP Benchmark, which contains diverse task types: multiple-choice, fill-in-the-blank, True/False, and descriptive questions in both open- and closed-ended settings. The proposed benchmark is designed to evaluate the relational reasoning and knowledge-connection capability of personalized MLLMs. We conduct experiments on the proposed ReGraP-LLaVA and other competitive MLLMs. Results show that the proposed model not only learns personalized knowledge but also performs relational reasoning in responses, achieving the SoTA performance compared with the competitive methods. All the codes and datasets are released at: https://github.com/xyfyyds/ReGraP.
Digital 'immortality' is coming and we're not ready for it
In the 1990 fantasy drama - Truly, Madly, Deeply, lead character Nina, (Juliet Stevenson), is grieving the recent death of her boyfriend Jamie (Alan Rickman). Sensing her profound sadness, Jamie returns as a ghost to help her process her loss. If you've seen the film, you'll know that his reappearance forces her to question her memory of him and, in turn, accept that maybe he wasn't as perfect as she'd remembered. Here in 2023, a new wave of AI-based "grief tech" offers us all the chance to spend time with loved ones after their death -- in varying forms. But unlike Jamie (who benevolently misleads Nina), we're being asked to let artificial intelligence serve up a version of those we survive.
Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners
Tang, Xiaojuan, Zheng, Zilong, Li, Jiaqi, Meng, Fanxu, Zhu, Song-Chun, Liang, Yitao, Zhang, Muhan
The emergent few-shot reasoning capabilities of Large Language Models (LLMs) have excited the natural language and machine learning community over recent years. Despite of numerous successful applications, the underlying mechanism of such in-context capabilities still remains unclear. In this work, we hypothesize that the learned \textit{semantics} of language tokens do the most heavy lifting during the reasoning process. Different from human's symbolic reasoning process, the semantic representations of LLMs could create strong connections among tokens, thus composing a superficial logical chain. To test our hypothesis, we decouple semantics from the language reasoning process and evaluate three kinds of reasoning abilities, i.e., deduction, induction and abduction. Our findings reveal that semantics play a vital role in LLMs' in-context reasoning -- LLMs perform significantly better when semantics are consistent with commonsense but struggle to solve symbolic or counter-commonsense reasoning tasks by leveraging in-context new knowledge. The surprising observations question whether modern LLMs have mastered the inductive, deductive and abductive reasoning abilities as in human intelligence, and motivate research on unveiling the magic existing within the black-box LLMs. On the whole, our analysis provides a novel perspective on the role of semantics in developing and evaluating language models' reasoning abilities. Code is available at {\url{https://github.com/XiaojuanTang/ICSR}}.
Voice + AI Is Coming To The Workplace Loud And Clear
Virtual assistants turn 16 this year and you don't have to look too hard – or speak too loudly – to find them. In fact, there will be around 8 billion voice-based devices by 2023 – more than the world's population today. From Amazon's Echo and Google's Assistant to Apple's Siri, Samsung's Bixby and Microsoft's Cortana, billions of people around the world are using their voices every day to schedule appointments, get directions, play music or get answers quickly-- all things that once required us to tediously type or write. Even Twitter recently announced that users can now audio tweet their inner musings. And yet, despite widespread adoption of voice-based devices in our personal lives, applications based on voice are nowhere as pervasive in our professional lives as they are in our homes.
We Met In May review – cute dating sim is a witty ode to early love
For the last five years, the independent game designer Nina Freeman, working with small teams of collaborators, has been exploring the boundaries and connections between video games, art and autobiography. Her witty, ethereal projects often involve her own experiences with family and lovers, and tease relatable truths from the most subtle interactions: a girl learning about sex while playing with dolls; a young woman's online relationship explored through the folders on her PC desktop. As a "player", your role is often negligible, flitting between embodiment, friendship and voyeurism. We Met in May is a set of four vignette games about the early moments in a romantic relationship, ostensibly between Nina herself and the game's programmer, Jake Jefferies. In Nothing to Hide, Nina has invited Jake back to her flat for the first time and, bashful about its untidiness and her collection of anime plushies and posters, considers hiding things from him – it's up to the player to decide what she conceals.
Natural Language Processing - Current Applications and Future Possibilities
A 2017 Tractica report on the natural language processing (NLP) market estimates the total NLP software, hardware, and services market opportunity to be around $22.3 billion by 2025. The report also forecasts that NLP software solutions leveraging AI will see a market growth from $136 million in 2016 to $5.4 billion by 2025. In order to shed more light on the growing applications of NLP solutions, Dan Faggella, the CEO of TechEmergence, converses with Vlad Sejnoha, the CTO of Nuance Communications, an organization offering AI and NLP solutions in voice, natural language understanding, reasoning and systems integration. Vlad Sejnoha has been the Senior Vice President and CTO at Nuance since 2001. He holds a Masters degree in Electrical Engineering from McGill University. Vlad has been working in the field of NLP and speech recognition for over 30 years and holds 22 patents to date.
Putting Intelligent Characters to Work
Extempo Systems, Inc., was founded in 1995 to commercialize intelligent characters. Our team built innovative software and novel applications for several markets. We had some early-adopting customers during the Internet boom, but the company could not survive the significant downturn in corporate IT spending when the bubble burst. In 2004, Extempo ceased operations and was formally liquidated. Although our commercial venture failed, we advanced the technology for intelligent characters and learned a lot about how (not) to take them to market.
How Banks Are Leveraging Chatbots for Customer Service Crowdfund Insider
We've been hearing a lot of talk about chatbots lately. Some of these conversations are about the ways that companies are having success with chatbots, but others are about brands that have failed to implement them properly. A chatbot is a piece of software that can simulate human conversation. A human types or speaks a request, and the AI chatbot processes the language and provides the appropriate response. Depending on the programming, the functionality can vary significantly.