Goto

Collaborating Authors

 image look


Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization

arXiv.org Artificial Intelligence

For example, the robot must recognize whether a door is open, a light is on, water is running, a fire is burning, and so on. In order to change the robot's behavior based on the recognition results, state recognition is usually performed with discrete values of about two or three options. Until now, appropriate individual methods have been used for each state to be recognized, such as direct processing of images or point clouds by human programming [3, 4], creating a dataset with annotations and training neural networks [5], or detecting the state by installing new sensors [6, 7]. However, these methods require us to manually program the process for each state recognition, to train neural networks one by one, and to increase the number of sensors installed. In addition, this will increase the number of programs and trained models needed for each state recognition, which will cause problems in management of source code and computer resource. To cope with these problems, a single program or model should be able to recognize multiple states. In this study, we propose a method to easily recognize various environmental states in a unified manner and through the spoken language (Figure 1). In order to perform state recognition through the spoken language, we use pre-trained large-scale vision-language models (VLMs) [8-12]. Currently, VLMs are being used for map generation [13, 14], scene understanding [15-17], and feature extraction for behav-Corresponding author.


VQA-based Robotic State Recognition Optimized with Genetic Algorithm

arXiv.org Artificial Intelligence

State recognition of objects and environment in robots has been conducted in various ways. In most cases, this is executed by processing point clouds, learning images with annotations, and using specialized sensors. In contrast, in this study, we propose a state recognition method that applies Visual Question Answering (VQA) in a Pre-Trained Vision-Language Model (PTVLM) trained from a large-scale dataset. By using VQA, it is possible to intuitively describe robotic state recognition in the spoken language. On the other hand, there are various possible ways to ask about the same event, and the performance of state recognition differs depending on the question. Therefore, in order to improve the performance of state recognition using VQA, we search for an appropriate combination of questions using a genetic algorithm. We show that our system can recognize not only the open/closed of a refrigerator door and the on/off of a display, but also the open/closed of a transparent door and the state of water, which have been difficult to recognize.


The same frightening women is keep on appearing in AI generated Images - Creepy AI

#artificialintelligence

Loab was created by negatively weighted prompts and her macabre appearance consistely turns up in images, even when the AI is directed away from Loab's prompts. The creepy monster was discovered by Supercomposite who is a Swedish musician. "I discovered this woman, who I call Loab, in April. The AI reproduced her more easily than most celebrities. Her presence is persistent, and she haunts every image she touches," writes Supercomposite.


Metapix – Metapix makes your images look creative with funny photo effects in seconds with the help of AI.

#artificialintelligence

Celebrate Birthday With Our Happy Birthday Effect & Enjoy and share your Friends and Family. Make your photo Magical with Birthday Effect. These are amazing and stunning photo . These beautiful effects are ideal to your memories and make them unforgettable.


SnapArt – SnapArt makes your images look creative with funny photo effects in seconds with the help of AI.

#artificialintelligence

You have visited an Art gallery and amazed by not only the Art collection but frames around Art. Frames create new perspectives on images and make them more relevant. We provide you a chance to Lose your inner Artist inside you and create amazing frames for your images.


Do you know which inputs your neural network likes most? :: Päpper's Coding Blog -- Have fun coding.

#artificialintelligence

Recent advances in training deep neural networks have led to a whole bunch of impressive machine learning models which are able to tackle a very diverse range of tasks. When you are developing such a model, one of the notable downsides is that it is considered a "black-box" approach in the sense that your model learns from data you feed it, but you don't really know what is going on inside the model. To make it clearer: you don't really know what your model actually learned and if you have a flaw in your training / data approach it might work well according to your metrics while having learnt the wrong thing. As a self-respecting developer you want to do better than that, so today I will show you a method you can use to get some better introspection into your model by using visualization techniques. So what is a visualization techniqe when we talk about deep neural networks?