Goto

Collaborating Authors

 liquid


Inferring Dynamic Physical Properties from Video Foundation Models

Zhan, Guanqi, Ma, Xianzheng, Xie, Weidi, Zisserman, Andrew

arXiv.org Artificial Intelligence

We study the task of predicting dynamic physical properties from videos. More specifically, we consider physical properties that require temporal information to be inferred: elasticity of a bouncing object, viscosity of a flowing liquid, and dynamic friction of an object sliding on a surface. To this end, we make the following contributions: (i) We collect a new video dataset for each physical property, consisting of synthetic training and testing splits, as well as a real split for real world evaluation. (ii) We explore three ways to infer the physical property from videos: (a) an oracle method where we supply the visual cues that intrinsically reflect the property using classical computer vision techniques; (b) a simple read out mechanism using a visual prompt and trainable prompt vector for cross-attention on pre-trained video generative and self-supervised models; and (c) prompt strategies for Multi-modal Large Language Models (MLLMs). (iii) We show that video foundation models trained in a generative or self-supervised manner achieve a similar performance, though behind that of the oracle, and MLLMs are currently inferior to the other models, though their performance can be improved through suitable prompting.


Autonomous Close-Proximity Photovoltaic Panel Coating Using a Quadcopter

Jacquemont, Dimitri, Bosio, Carlo, Yang, Teaya, Zhang, Ruiqi, Orun, Ozgur, Li, Shuai, Alam, Reza, Schutzius, Thomas M., Makiharju, Simo A., Mueller, Mark W.

arXiv.org Artificial Intelligence

Photovoltaic (PV) panels are becoming increasingly widespread in the domain of renewable energy, and thus, small efficiency gains can have massive effects. Anti-reflective and self-cleaning coatings enhance panel performance but degrade over time, requiring periodic reapplication. Uncrewed Aerial Vehicles (UAVs) offer a flexible and autonomous way to apply protective coatings more often and at lower cost compared to traditional manual coating methods. In this letter, we propose a quadcopter-based system, equipped with a liquid dispersion mechanism, designed to automate such tasks. The localization stack only uses onboard sensors, relying on visual-inertial odometry and the relative position of the PV panel detected with respect to the quadcopter. The control relies on a model-based controller that accounts for the ground effect and the mass decrease of the quadcopter during liquid dispersion. We validate the autonomy capabilities of our system through extensive indoor and outdoor experiments.


Robotic Skill Diversification via Active Mutation of Reward Functions in Reinforcement Learning During a Liquid Pouring Task

van Buuren, Jannick, Giglio, Roberto, Roveda, Loris, Peternel, Luka

arXiv.org Artificial Intelligence

This paper explores how deliberate mutations of reward function in reinforcement learning can produce diversified skill variations in robotic manipulation tasks, examined with a liquid pouring use case. To this end, we developed a new reward function mutation framework that is based on applying Gaussian noise to the weights of the different terms in the reward function. Inspired by the cost-benefit tradeoff model from human motor control, we designed the reward function with the following key terms: accuracy, time, and effort. The study was performed in a simulation environment created in NVIDIA Isaac Sim, and the setup included Franka Emika Panda robotic arm holding a glass with a liquid that needed to be poured into a container. The reinforcement learning algorithm was based on Proximal Policy Optimization. We systematically explored how different configurations of mutated weights in the rewards function would affect the learned policy. The resulting policies exhibit a wide range of behaviours: from variations in execution of the originally intended pouring task to novel skills useful for unexpected tasks, such as container rim cleaning, liquid mixing, and watering. This approach offers promising directions for robotic systems to perform diversified learning of specific tasks, while also potentially deriving meaningful skills for future tasks.


M$^3$-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation

Chen, Zixuan, Li, Jiaxin, Tan, Liming, Guo, Yejie, Liang, Junxuan, Lu, Cewu, Li, Yong-Lu

arXiv.org Artificial Intelligence

Intelligent robots need to interact with diverse objects across various environments. The appearance and state of objects frequently undergo complex transformations depending on the object properties, e.g., phase transitions. However, in the vision community, segmenting dynamic objects with phase transitions is overlooked. In light of this, we introduce the concept of phase in segmentation, which categorizes real-world objects based on their visual characteristics and potential morphological and appearance changes. Then, we present a new benchmark, Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation (M$^3$-VOS), to verify the ability of models to understand object phases, which consists of 479 high-resolution videos spanning over 10 distinct everyday scenarios. It provides dense instance mask annotations that capture both object phases and their transitions. We evaluate state-of-the-art methods on M$^3$-VOS, yielding several key insights. Notably, current appearancebased approaches show significant room for improvement when handling objects with phase transitions. The inherent changes in disorder suggest that the predictive performance of the forward entropy-increasing process can be improved through a reverse entropy-reducing process. These findings lead us to propose ReVOS, a new plug-andplay model that improves its performance by reversal refinement. Our data and code will be publicly available at https://zixuan-chen.github.io/M-cubeVOS.github.io/.


Bacteria-powered artificial tongue can taste-test alcohol for additives

Popular Science

A tiny device home to genetically modified bacteria may soon function like an artificial tongue that rapidly analyzes an alcoholic drink's chemical composition. Using existing biological nanopore technology that underpins DNA sequencing, these new tools could even one day test whether or not a beverage is contaminated with unwanted additives, or even deadly toxins. Current nanopore technology relies on modified bacterium, usually Mycobacterium smegmatis, to perform microscopic chemical assessments. To accomplish this, experts first create extremely tiny holes only a few nanometers wide in the bacteria's cell membrane. Researchers then mix the altered organisms into a liquid before applying a small electrical charge to the solution.


Learning Autonomous Surgical Irrigation and Suction with the da Vinci Research Kit Using Reinforcement Learning

Ou, Yafei, Tavakoli, Mahdi

arXiv.org Artificial Intelligence

The irrigation-suction process is a common procedure to rinse and clean up the surgical field in minimally invasive surgery (MIS). In this process, surgeons first irrigate liquid, typically saline, into the surgical scene for rinsing and diluting the contaminant, and then suction the liquid out of the surgical field. While recent advances have shown promising results in the application of reinforcement learning (RL) for automating surgical subtasks, fewer studies have explored the automation of fluid-related tasks. In this work, we explore the automation of both steps in the irrigation-suction procedure and train two vision-based RL agents to complete irrigation and suction autonomously. To achieve this, a platform is developed for creating simulated surgical robot learning environments and for training agents, and two simulated learning environments are built for irrigation and suction with visually plausible fluid rendering capabilities. With techniques such as domain randomization (DR) and carefully designed reward functions, two agents are trained in the simulator and transferred to the real world. Individual evaluations of both agents show satisfactory real-world results. With an initial amount of around 5 grams of contaminants, the irrigation agent ultimately achieved an average of 2.21 grams remaining after a manual suction. As a comparison, fully manual operation by a human results in 1.90 grams remaining. The suction agent achieved 2.64 and 2.24 grams of liquid remaining across two trial groups with more than 20 and 30 grams of initial liquid in the container. Fully autonomous irrigation-suction trials reduce the contaminant in the container from around 5 grams to an average of 2.42 grams, although yielding a higher total weight remaining (4.40) due to residual liquid not suctioned. Further information about the project is available at https://tbs-ualberta.github.io/CRESSim/.


Vision-Language Model-based Physical Reasoning for Robot Liquid Perception

Lai, Wenqiang, Gao, Yuan, Lam, Tin Lun

arXiv.org Artificial Intelligence

There is a growing interest in applying large language models (LLMs) in robotic tasks, due to their remarkable reasoning ability and extensive knowledge learned from vast training corpora. Grounding LLMs in the physical world remains an open challenge as they can only process textual input. Recent advancements in large vision-language models (LVLMs) have enabled a more comprehensive understanding of the physical world by incorporating visual input, which provides richer contextual information than language alone. In this work, we proposed a novel paradigm that leveraged GPT-4V(ision), the state-of-the-art LVLM by OpenAI, to enable embodied agents to perceive liquid objects via image-based environmental feedback. Specifically, we exploited the physical understanding of GPT-4V to interpret the visual representation (e.g., time-series plot) of non-visual feedback (e.g., F/T sensor data), indirectly enabling multimodal perception beyond vision and language using images as proxies. We evaluated our method using 10 common household liquids with containers of various geometry and material. Without any training or fine-tuning, we demonstrated that our method can enable the robot to indirectly perceive the physical response of liquids and estimate their viscosity. We also showed that by jointly reasoning over the visual and physical attributes learned through interactions, our method could recognize liquid objects in the absence of strong visual cues (e.g., container labels with legible text or symbols), increasing the accuracy from 69.0% -- achieved by the best-performing vision-only variant -- to 86.0%.


Liquid State Genetic Programming

Oltean, Mihai

arXiv.org Artificial Intelligence

A new Genetic Programming variant called Liquid State Genetic Programming (LSGP) is proposed in this paper. LSGP is a hybrid method combining a dynamic memory for storing the inputs (the liquid) and a Genetic Programming technique used for the problem solving part. Several numerical experiments with LSGP are performed by using several benchmarking problems. Numerical experiments show that LSGP performs similarly and sometimes even better than standard Genetic Programming for the considered test problems.


Scientists discover for the first time that sperm defy one of Newton's laws of PHYSICS

Daily Mail - Science & tech

Scientists have discovered that the way sperms swim defies Newton's law of motion, which states there is an equal and opposite reaction Researchers at Kyoto University found the sperms' flagella, or tail, propels the agents forward by changing their shape to interact with the fluid. Sperms do so in a non-reciprocal way, which violates Newton's third law because they do not elicit an equal and opposite reaction from their surroundings. The flagellum's elasticity also suggests that there should be no movement at all, but instead, sperms whip their tails without releasing much energy into their surroundings. Researchers at Kyoto University found the sperms' flagella, or tail, propels the agents forward by changing their shape to interact with the fluid The team used human sperm cells and algae for the research because both have flagella that help them propel through the liquid, New Scientist reports. Men's bulging waistlines are blamed for the worrying trend and'everywhere chemicals' in the environment.


Shared Telemanipulation with VR controllers in an anti slosh scenario

Grobbel, Max, Varga, Balint, Hohmann, Sören

arXiv.org Artificial Intelligence

Telemanipulation has become a promising technology that combines human intelligence with robotic capabilities to perform tasks remotely. However, it faces several challenges such as insufficient transparency, low immersion, and limited feedback to the human operator. Moreover, the high cost of haptic interfaces is a major limitation for the application of telemanipulation in various fields, including elder care, where our research is focused. To address these challenges, this paper proposes the usage of nonlinear model predictive control for telemanipulation using low-cost virtual reality controllers, including multiple control goals in the objective function. The framework utilizes models for human input prediction and taskrelated models of the robot and the environment. The proposed framework is validated on an UR5e robot arm in the scenario of handling liquid without spilling. Further extensions of the framework such as pouring assistance and collision avoidance can easily be included.