Goto

Collaborating Authors

 Energy


Multimodal AI Systems for Enhanced Laying Hen Welfare Assessment and Productivity Optimization

arXiv.org Artificial Intelligence

The future of poultry production depends on a paradigm shift replacing subjective, labor-intensive welfare checks with data-driven, intelligent monitoring ecosystems. Traditional welfare assessments-limited by human observation and single-sensor data-cannot fully capture the complex, multidimensional nature of laying hen welfare in modern farms. Multimodal Artificial Intelligence (AI) offers a breakthrough, integrating visual, acoustic, environmental, and physiological data streams to reveal deeper insights into avian welfare dynamics. This investigation highlights multimodal As transformative potential, showing that intermediate (feature-level) fusion strategies achieve the best balance between robustness and performance under real-world poultry conditions, and offer greater scalability than early or late fusion approaches. Key adoption barriers include sensor fragility in harsh farm environments, high deployment costs, inconsistent behavioral definitions, and limited cross-farm generalizability. To address these, we introduce two novel evaluation tools - the Domain Transfer Score (DTS) to measure model adaptability across diverse farm settings, and the Data Reliability Index (DRI) to assess sensor data quality under operational constraints. We also propose a modular, context-aware deployment framework designed for laying hen environments, enabling scalable and practical integration of multimodal sensing. This work lays the foundation for a transition from reactive, unimodal monitoring to proactive, precision-driven welfare systems that unite productivity with ethical, science based animal care.


Feedback Control of a Single-Tail Bioinspired 59-mg Swimmer

arXiv.org Artificial Intelligence

We present an evolved steerable version of the single-tail Fish-&-Ribbon-Inspired Small Swimming Harmonic roBot (FRISSHBot), a 59-mg biologically inspired swimmer, which is driven by a new shape-memory alloy (SMA)-based bimorph actuator. The new FRISSHBot is controllable in the two-dimensional (2D) space, which enabled the first demonstration of feedback-controlled trajectory tracking of a single-tail aquatic robot with onboard actuation at the subgram scale. These new capabilities are the result of a physics-informed design with an enlarged head and shortened tail relative to those of the original platform. Enhanced by its design, this new platform achieves forward swimming speeds of up to 13.6 mm/s (0.38 Bl/s), which is over four times that of the original platform. Furthermore, when following 2D references in closed loop, the tested FRISSHBot prototype attains forward swimming speeds of up to 9.1 mm/s, root-mean-square (RMS) tracking errors as low as 2.6 mm, turning rates of up to 13.1 ยฐ/s, and turning radii as small as 10 mm.


A Survey on Agentic Service Ecosystems: Measurement, Analysis, and Optimization

arXiv.org Artificial Intelligence

The Agentic Service Ecosystem consists of heterogeneous autonomous agents (e.g., intelligent machines, humans, and human-machine hybrid systems) that interact through resource exchange and service co-creation. These agents, with distinct behaviors and motivations, exhibit autonomous perception, reasoning, and action capabilities, which increase system complexity and make traditional linear analysis methods inadequate. Swarm intelligence, characterized by decentralization, self-organization, emergence, and dynamic adaptability, offers a novel theoretical lens and methodology for understanding and optimizing such ecosystems. However, current research, owing to fragmented perspectives and cross-ecosystem differences, fails to comprehensively capture the complexity of swarm-intelligence emergence in agentic contexts. The lack of a unified methodology further limits the depth and systematic treatment of the research. This paper proposes a framework for analyzing the emergence of swarm intelligence in Agentic Service Ecosystems, with three steps: measurement, analysis, and optimization, to reveal the cyclical mechanisms and quantitative criteria that foster emergence. By reviewing existing technologies, the paper analyzes their strengths and limitations, identifies unresolved challenges, and shows how this framework provides both theoretical support and actionable methods for real-world applications.


Adapting LLMs to Time Series Forecasting via Temporal Heterogeneity Modeling and Semantic Alignment

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have recently demonstrated impressive capabilities in natural language processing due to their strong generalization and sequence modeling capabilities. However, their direct application to time series forecasting remains challenging due to two fundamental issues: the inherent heterogeneity of temporal patterns and the modality gap between continuous numerical signals and discrete language representations. In this work, we propose TALON, a unified framework that enhances LLM-based forecasting by modeling temporal heterogeneity and enforcing semantic alignment. Specifically, we design a Heterogeneous Temporal Encoder that partitions multivariate time series into structurally coherent segments, enabling localized expert modeling across diverse temporal patterns. To bridge the modality gap, we introduce a Semantic Alignment Module that aligns temporal features with LLM-compatible representations, enabling effective integration of time series into language-based models while eliminating the need for handcrafted prompts during inference. Extensive experiments on seven real-world benchmarks demonstrate that TALON achieves superior performance across all datasets, with average MSE improvements of up to 11\% over recent state-of-the-art methods. These results underscore the effectiveness of incorporating both pattern-aware and semantic-aware designs when adapting LLMs for time series forecasting. The code is available at: https://github.com/syrGitHub/TALON.


From Time-series Generation, Model Selection to Transfer Learning: A Comparative Review of Pixel-wise Approaches for Large-scale Crop Mapping

arXiv.org Artificial Intelligence

Crop mapping involves identifying and classifying crop types using spatial data, primarily derived from remote sensing imagery. This study presents the first comprehensive review of large-scale, pixel-wise crop mapping workflows, encompassing both conventional supervised methods and emerging transfer learning approaches. To identify the optimal time-series generation approaches and supervised crop mapping models, we conducted systematic experiments, comparing six widely adopted satellite image-based preprocessing methods, alongside eleven supervised pixel-wise classification models. Additionally, we assessed the synergistic impact of varied training sample sizes and variable combinations. Moreover, we identified optimal transfer learning techniques for different magnitudes of domain shift. The evaluation of optimal methods was conducted across five diverse agricultural sites. Landsat 8 served as the primary satellite data source. Labels come from CDL trusted pixels and field surveys. Our findings reveal three key insights. First, fine-scale interval preprocessing paired with Transformer models consistently delivered optimal performance for both supervised and transferable workflows. RF offered rapid training and competitive performance in conventional supervised learning and direct transfer to similar domains. Second, transfer learning techniques enhanced workflow adaptability, with UDA being effective for homogeneous crop classes while fine-tuning remains robust across diverse scenarios. Finally, workflow choice depends heavily on the availability of labeled samples. With a sufficient sample size, supervised training typically delivers more accurate and generalizable results. Below a certain threshold, transfer learning that matches the level of domain shift is a viable alternative to achieve crop mapping. All code is publicly available to encourage reproducibility practice.


Is Single-View Mesh Reconstruction Ready for Robotics?

arXiv.org Artificial Intelligence

This paper evaluates single-view mesh reconstruction models for their potential in enabling instant digital twin creation for real-time planning and dynamics prediction using physics simulators for robotic manipulation. Recent single-view 3D reconstruction advances offer a promising avenue toward an automated real-to-sim pipeline: directly mapping a single observation of a scene into a simulation instance by reconstructing scene objects as individual, complete, and physically plausible 3D meshes. However, their suitability for physics simulations and robotics applications under immediacy, physical fidelity, and simulation readiness remains underexplored. We establish robotics-specific benchmarking criteria for 3D reconstruction, including handling typical inputs, collision-free and stable geometry, occlusions robustness, and meeting computational constraints. Our empirical evaluation using realistic robotics datasets shows that despite success on computer vision benchmarks, existing approaches fail to meet robotics-specific requirements. We quantitively examine limitations of single-view reconstruction for practical robotics implementation, in contrast to prior work that focuses on multi-view approaches. Our findings highlight critical gaps between computer vision advances and robotics needs, guiding future research at this intersection.


Embodied intelligent industrial robotics: Concepts and techniques

arXiv.org Artificial Intelligence

In order to work more efficiently, accurately, reliably, and safely in industrial scenarios, robots should have at least general knowledge, working-environment knowledge, and operating-object knowledge. These pose significant challenges to existing embodied intelligent robotics (EIR) techniques. Thus, this paper first briefly reviews the history of industrial robotics and analyzes the limitations of mainstream EIR frameworks. Then, a knowledge-driven technical framework of embodied intelligent industrial robotics (EIIR) is proposed for various industrial environments. It has five modules: a world model, a high-level task planner, a low-level skill controller, a simulator, and a physical system. The development of techniques related to each module are also thoroughly reviewed, and recent progress regarding their adaption to industrial applications are discussed. A case study is given to demonstrate the newly proposed EIIR framework's applicability to real-world assembly system. Finally, the key challenges that EIIR encounters in industrial scenarios are summarized and future research directions are suggested. The authors believe that EIIR technology is shaping the next generation of industrial robotics and EIIR-based industrial systems supply a new technological paradigm for intelligent manufacturing. It is expected that this review could serve as a valuable reference for scholars and engineers that are interested in industrial embodied intelligence. Together, scholars can use this research to drive their rapid advancement and application of EIIR techniques. The interested authors would continue to track and contribute new studies in the project page https://github.com/jackyzengl/EIIR.


Reconstruction of Solar EUV Irradiance Using CaII K Images and SOHO/SEM Data with Bayesian Deep Learning and Uncertainty Quantification

arXiv.org Artificial Intelligence

Solar extreme ultraviolet (EUV) irradiance plays a crucial role in heating the Earth's ionosphere, thermosphere, and mesosphere, affecting atmospheric dynamics over varying time scales. Although significant effort has been spent studying short-term EUV variations from solar transient events, there is little work to explore the long-term evolution of the EUV flux over multiple solar cycles. Continuous EUV flux measurements have only been available since 1995, leaving significant gaps in earlier data. In this study, we propose a Bayesian deep learning model, named SEMNet, to fill the gaps. We validate our approach by applying SEMNet to construct SOHO/SEM EUV flux measurements in the period between 1998 and 2014 using CaII K images from the Precision Solar Photometric Telescope. We then extend SEMNet through transfer learning to reconstruct solar EUV irradiance in the period between 1950 and 1960 using CaII K images from the Kodaikanal Solar Observatory. Experimental results show that SEMNet provides reliable predictions along with uncertainty bounds, demonstrating the feasibility of CaII K images as a robust proxy for long-term EUV fluxes. These findings contribute to a better understanding of solar influences on Earth's climate over extended periods.


TerraMAE: Learning Spatial-Spectral Representations from Hyperspectral Earth Observation Data via Adaptive Masked Autoencoders

arXiv.org Artificial Intelligence

Hyperspectral satellite imagery offers sub-30 m views of Earth in hundreds of contiguous spectral bands, enabling fine-grained mapping of soils, crops, and land cover. While self-supervised Masked Autoencoders excel on RGB and low-band multispectral data, they struggle to exploit the intricate spatial-spectral correlations in 200+ band hyperspectral images. We introduce TerraMAE, a novel HSI encoding framework specifically designed to learn highly representative spatial-spectral embeddings for diverse geospatial analyses. TerraMAE features an adaptive channel grouping strategy, based on statistical reflectance properties to capture spectral similarities, and an enhanced reconstruction loss function that incorporates spatial and spectral quality metrics. We demonstrate TerraMAE's effectiveness through superior spatial-spectral information preservation in high-fidelity image reconstruction. Furthermore, we validate its practical utility and the quality of its learned representations through strong performance on three key downstream geospatial tasks: crop identification, land cover classification, and soil texture prediction.


Can Multitask Learning Enhance Model Explainability?

arXiv.org Artificial Intelligence

Remote sensing provides satellite data in diverse types and formats. The usage of multimodal learning networks exploits this diversity to improve model performance, except that the complexity of such networks comes at the expense of their interpretability. In this study, we explore how modalities can be leveraged through multitask learning to intrinsically explain model behavior. In particular, instead of additional inputs, we use certain modalities as additional targets to be predicted along with the main task. The success of this approach relies on the rich information content of satellite data, which remains as input modalities. We show how this modeling context provides numerous benefits: (1) in case of data scarcity, the additional modalities do not need to be collected for model inference at deployment, (2) the model performance remains comparable to the multimodal baseline performance, and in some cases achieves better scores, (3) prediction errors in the main task can be explained via the model behavior in the auxiliary task(s). We demonstrate the efficiency of our approach on three datasets, including segmentation, classification, and regression tasks.