Plotting

 Wang, Jin


CapsuleBot: A Novel Compact Hybrid Aerial-Ground Robot with Two Actuated-wheel-rotors

arXiv.org Artificial Intelligence

This paper presents the design, modeling, and experimental validation of CapsuleBot, a compact hybrid aerial-ground vehicle designed for long-term covert reconnaissance. CapsuleBot combines the manoeuvrability of bicopter in the air with the energy efficiency and noise reduction of ground vehicles on the ground. To accomplish this, a structure named actuated-wheel-rotor has been designed, utilizing a sole motor for both the unilateral rotor tilting in the bicopter configuration and the wheel movement in ground mode. CapsuleBot comes equipped with two of these structures, enabling it to attain hybrid aerial-ground propulsion with just four motors. Importantly, the decoupling of motion modes is achieved without the need for additional drivers, enhancing the versatility and robustness of the system. Furthermore, we have designed the full dynamics and control for aerial and ground locomotion based on the bicopter model and the two-wheeled self-balancing vehicle model. The performance of CapsuleBot has been validated through experiments. The results demonstrate that CapsuleBot produces 40.53% less noise in ground mode and consumes 99.35% less energy, highlighting its potential for long-term covert reconnaissance applications.


Language models as master equation solvers

arXiv.org Artificial Intelligence

Master equations are of fundamental importance in modeling stochastic dynamical systems.However, solving master equations is challenging due to the exponential increase in the number of possible states or trajectories with the dimension of the state space. In this study, we propose repurposing language models as a machine learning approach to solve master equations. We design a prompt-based neural network to map rate parameters, initial conditions, and time values directly to the state joint probability distribution that exactly matches the input contexts. In this way, we approximate the solution of the master equation in its most general form. We train the network using the policy gradient algorithm within the reinforcement learning framework, with feedback rewards provided by a set of variational autoregressive models. By applying this approach to representative examples, we observe high accuracy for both multi-module and high-dimensional systems. The trained network also exhibits extrapolating ability, extending its predictability to unseen data. Our findings establish the connection between language models and master equations, highlighting the possibility of using a single pretrained large model to solve any master equation.


Technical note: ShinyAnimalCV: open-source cloud-based web application for object detection, segmentation, and three-dimensional visualization of animals using computer vision

arXiv.org Artificial Intelligence

Computer vision (CV), a non-intrusive and cost-effective technology, has furthered the development of precision livestock farming by enabling optimized decision-making through timely and individualized animal care. The availability of affordable two- and three-dimensional camera sensors, combined with various machine learning and deep learning algorithms, has provided a valuable opportunity to improve livestock production systems. However, despite the availability of various CV tools in the public domain, applying these tools to animal data can be challenging, often requiring users to have programming and data analysis skills, as well as access to computing resources. Moreover, the rapid expansion of precision livestock farming is creating a growing need to educate and train animal science students in CV. This presents educators with the challenge of efficiently demonstrating the complex algorithms involved in CV. Thus, the objective of this study was to develop ShinyAnimalCV, an open-source cloud-based web application. This application provides a user-friendly interface for performing CV tasks, including object segmentation, detection, three-dimensional surface visualization, and extraction of two- and three-dimensional morphological features. Nine pre-trained CV models using top-view animal data are included in the application. ShinyAnimalCV has been deployed online using cloud computing platforms. The source code of ShinyAnimalCV is available on GitHub, along with detailed documentation on training CV models using custom data and deploying ShinyAnimalCV locally to allow users to fully leverage the capabilities of the application. ShinyAnimalCV can contribute to CV research and teaching in the animal science community.


Depth video data-enabled predictions of longitudinal dairy cow body weight using thresholding and Mask R-CNN algorithms

arXiv.org Artificial Intelligence

Monitoring cow body weight is crucial to support farm management decisions due to its direct relationship with the growth, nutritional status, and health of dairy cows. Cow body weight is a repeated trait, however, the majority of previous body weight prediction research only used data collected at a single point in time. Furthermore, the utility of deep learning-based segmentation for body weight prediction using videos remains unanswered. Therefore, the objectives of this study were to predict cow body weight from repeatedly measured video data, to compare the performance of the thresholding and Mask R-CNN deep learning approaches, to evaluate the predictive ability of body weight regression models, and to promote open science in the animal science community by releasing the source code for video-based body weight prediction. A total of 40,405 depth images and depth map files were obtained from 10 lactating Holstein cows and 2 non-lactating Jersey cows. Three approaches were investigated to segment the cow's body from the background, including single thresholding, adaptive thresholding, and Mask R-CNN. Four image-derived biometric features, such as dorsal length, abdominal width, height, and volume, were estimated from the segmented images. On average, the Mask-RCNN approach combined with a linear mixed model resulted in the best prediction coefficient of determination and mean absolute percentage error of 0.98 and 2.03%, respectively, in the forecasting cross-validation. The Mask-RCNN approach was also the best in the leave-three-cows-out cross-validation. The prediction coefficients of determination and mean absolute percentage error of the Mask-RCNN coupled with the linear mixed model were 0.90 and 4.70%, respectively. Our results suggest that deep learning-based segmentation improves the prediction performance of cow body weight from longitudinal depth video data.


Catch Planner: Catching High-Speed Targets in the Flight

arXiv.org Artificial Intelligence

Catching high-speed targets in the flight is a complex and typical highly dynamic task. In this paper, we propose Catch Planner, a planning-with-decision scheme for catching. For sequential decision making, we propose a policy search method based on deep reinforcement learning. In order to make catching adaptive and flexible, we propose a trajectory optimization method to jointly optimize the highly coupled catching time and terminal state while considering the dynamic feasibility and safety. We also propose a flexible constraint transcription method to catch targets at any reasonable attitude and terminal position bias. The proposed Catch Planner provides a new paradigm for the combination of learning and planning and is integrated on the quadrotor designed by ourselves, which runs at 100hz on the onboard computer. Extensive experiments are carried out in real and simulated scenes to verify the robustness of the proposed method and its expansibility when facing a variety of high-speed flying targets.


Roller-Quadrotor: A Novel Hybrid Terrestrial/Aerial Quadrotor with Unicycle-Driven and Rotor-Assisted Turning

arXiv.org Artificial Intelligence

The Roller-Quadrotor is a novel quadrotor that combines the maneuverability of aerial drones with the endurance of ground vehicles. This work focuses on the design, modeling, and experimental validation of the Roller-Quadrotor. Flight capabilities are achieved through a quadrotor configuration, with four thrust-providing actuators. Additionally, rolling motion is facilitated by a unicycle-driven and rotor-assisted turning structure. By utilizing terrestrial locomotion, the vehicle can overcome rolling and turning resistance, thereby conserving energy compared to its flight mode. This innovative approach not only tackles the inherent challenges of traditional rotorcraft but also enables the vehicle to navigate through narrow gaps and overcome obstacles by taking advantage of its aerial mobility. We develop comprehensive models and controllers for the Roller-Quadrotor and validate their performance through experiments. The results demonstrate its seamless transition between aerial and terrestrial locomotion, as well as its ability to safely navigate through gaps half the size of its diameter. Moreover, the terrestrial range of the vehicle is approximately 2.8 times greater, while the operating time is about 41.2 times longer compared to its aerial capabilities. These findings underscore the feasibility and effectiveness of the proposed structure and control mechanisms for efficient navigation through challenging terrains while conserving energy.


Hedonic Prices and Quality Adjusted Price Indices Powered by AI

arXiv.org Artificial Intelligence

Accurate, real-time measurements of price index changes using electronic records are essential for tracking inflation and productivity in today's economic environment. We develop empirical hedonic models that can process large amounts of unstructured product data (text, images, prices, quantities) and output accurate hedonic price estimates and derived indices. To accomplish this, we generate abstract product attributes, or ``features,'' from text descriptions and images using deep neural networks, and then use these attributes to estimate the hedonic price function. Specifically, we convert textual information about the product to numeric features using large language models based on transformers, trained or fine-tuned using product descriptions, and convert the product image to numeric features using a residual network model. To produce the estimated hedonic price function, we again use a multi-task neural network trained to predict a product's price in all time periods simultaneously. To demonstrate the performance of this approach, we apply the models to Amazon's data for first-party apparel sales and estimate hedonic prices. The resulting models have high predictive accuracy, with $R^2$ ranging from $80\%$ to $90\%$. Finally, we construct the AI-based hedonic Fisher price index, chained at the year-over-year frequency. We contrast the index with the CPI and other electronic indices.


Federated Ensemble Model-based Reinforcement Learning in Edge Computing

arXiv.org Artificial Intelligence

Federated learning (FL) is a privacy-preserving distributed machine learning paradigm that enables collaborative training among geographically distributed and heterogeneous devices without gathering their data. Extending FL beyond the supervised learning models, federated reinforcement learning (FRL) was proposed to handle sequential decision-making problems in edge computing systems. However, the existing FRL algorithms directly combine model-free RL with FL, thus often leading to high sample complexity and lacking theoretical guarantees. To address the challenges, we propose a novel FRL algorithm that effectively incorporates model-based RL and ensemble knowledge distillation into FL for the first time. Specifically, we utilise FL and knowledge distillation to create an ensemble of dynamics models for clients, and then train the policy by solely using the ensemble model without interacting with the environment. Furthermore, we theoretically prove that the monotonic improvement of the proposed algorithm is guaranteed. The extensive experimental results demonstrate that our algorithm obtains much higher sample efficiency compared to classic model-free FRL algorithms in the challenging continuous control benchmark environments under edge computing settings. The results also highlight the significant impact of heterogeneous client data and local model update steps on the performance of FRL, validating the insights obtained from our theoretical analysis.


Learning to Memorize Entailment and Discourse Relations for Persona-Consistent Dialogues

arXiv.org Artificial Intelligence

Maintaining engagement and consistency is particularly important in dialogue systems. Existing works have improved the performance of dialogue systems by intentionally learning interlocutor personas with sophisticated network structures. One issue with this approach is that it requires more personal corpora with annotations. Additionally, these models typically perform the next utterance prediction to generate a response but neglect the discourse coherence in the entire conversation. To address these issues, this study proposes a method of learning to memorize entailment and discourse relations for persona-consistent dialogue tasks. Entailment text pairs in natural language inference dataset were applied to learn latent entailment relations as external memories by premise-to-hypothesis generation task. Furthermore, an internal memory with a similar architecture was applied to the discourse information in the dialogue. Placing orthogonality restrictions on these two memory spaces ensures that the latent entailment relations remain dialogue-independent. Both memories collaborate to obtain entailment and discourse representation for the generation, allowing a deeper understanding of both consistency and coherence. Experiments on two large public datasets, PersonaChat and DSTC7-AVSD, demonstrated the effectiveness of the proposed method. Both automatic and human evaluations indicate that the proposed model outperforms several strong baselines in terms of both persona consistency and response coherence. Our source code is available at https://github.com/Chenrj233/LMEDR.


Joint Multimodal Entity-Relation Extraction Based on Edge-enhanced Graph Alignment Network and Word-pair Relation Tagging

arXiv.org Artificial Intelligence

Multimodal named entity recognition (MNER) and multimodal relation extraction (MRE) are two fundamental subtasks in the multimodal knowledge graph construction task. However, the existing methods usually handle two tasks independently, which ignores the bidirectional interaction between them. This paper is the first to propose jointly performing MNER and MRE as a joint multimodal entity-relation extraction task (JMERE). Besides, the current MNER and MRE models only consider aligning the visual objects with textual entities in visual and textual graphs but ignore the entity-entity relationships and object-object relationships. To address the above challenges, we propose an edge-enhanced graph alignment network and a word-pair relation tagging (EEGA) for JMERE task. Specifically, we first design a word-pair relation tagging to exploit the bidirectional interaction between MNER and MRE and avoid the error propagation. Then, we propose an edge-enhanced graph alignment network to enhance the JMERE task by aligning nodes and edges in the cross-graph. Compared with previous methods, the proposed method can leverage the edge information to auxiliary alignment between objects and entities and find the correlations between entity-entity relationships and object-object relationships. Experiments are conducted to show the effectiveness of our model.