image look
Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization
Kawaharazuka, Kento, Obinata, Yoshiki, Kanazawa, Naoaki, Okada, Kei, Inaba, Masayuki
For example, the robot must recognize whether a door is open, a light is on, water is running, a fire is burning, and so on. In order to change the robot's behavior based on the recognition results, state recognition is usually performed with discrete values of about two or three options. Until now, appropriate individual methods have been used for each state to be recognized, such as direct processing of images or point clouds by human programming [3, 4], creating a dataset with annotations and training neural networks [5], or detecting the state by installing new sensors [6, 7]. However, these methods require us to manually program the process for each state recognition, to train neural networks one by one, and to increase the number of sensors installed. In addition, this will increase the number of programs and trained models needed for each state recognition, which will cause problems in management of source code and computer resource. To cope with these problems, a single program or model should be able to recognize multiple states. In this study, we propose a method to easily recognize various environmental states in a unified manner and through the spoken language (Figure 1). In order to perform state recognition through the spoken language, we use pre-trained large-scale vision-language models (VLMs) [8-12]. Currently, VLMs are being used for map generation [13, 14], scene understanding [15-17], and feature extraction for behav-Corresponding author.
VQA-based Robotic State Recognition Optimized with Genetic Algorithm
Kawaharazuka, Kento, Obinata, Yoshiki, Kanazawa, Naoaki, Okada, Kei, Inaba, Masayuki
State recognition of objects and environment in robots has been conducted in various ways. In most cases, this is executed by processing point clouds, learning images with annotations, and using specialized sensors. In contrast, in this study, we propose a state recognition method that applies Visual Question Answering (VQA) in a Pre-Trained Vision-Language Model (PTVLM) trained from a large-scale dataset. By using VQA, it is possible to intuitively describe robotic state recognition in the spoken language. On the other hand, there are various possible ways to ask about the same event, and the performance of state recognition differs depending on the question. Therefore, in order to improve the performance of state recognition using VQA, we search for an appropriate combination of questions using a genetic algorithm. We show that our system can recognize not only the open/closed of a refrigerator door and the on/off of a display, but also the open/closed of a transparent door and the state of water, which have been difficult to recognize.
The same frightening women is keep on appearing in AI generated Images - Creepy AI
Loab was created by negatively weighted prompts and her macabre appearance consistely turns up in images, even when the AI is directed away from Loab's prompts. The creepy monster was discovered by Supercomposite who is a Swedish musician. "I discovered this woman, who I call Loab, in April. The AI reproduced her more easily than most celebrities. Her presence is persistent, and she haunts every image she touches," writes Supercomposite.
Do you know which inputs your neural network likes most? :: Päpper's Coding Blog -- Have fun coding.
Recent advances in training deep neural networks have led to a whole bunch of impressive machine learning models which are able to tackle a very diverse range of tasks. When you are developing such a model, one of the notable downsides is that it is considered a "black-box" approach in the sense that your model learns from data you feed it, but you don't really know what is going on inside the model. To make it clearer: you don't really know what your model actually learned and if you have a flaw in your training / data approach it might work well according to your metrics while having learnt the wrong thing. As a self-respecting developer you want to do better than that, so today I will show you a method you can use to get some better introspection into your model by using visualization techniques. So what is a visualization techniqe when we talk about deep neural networks?