Goto

Collaborating Authors

 Lazio


Self-Supervised Data Generation for Precision Agriculture: Blending Simulated Environments with Real Imagery

arXiv.org Artificial Intelligence

In precision agriculture, the scarcity of labeled data and significant covariate shifts pose unique challenges for training machine learning models. This scarcity is particularly problematic due to the dynamic nature of the environment and the evolving appearance of agricultural subjects as living things. We propose a novel system for generating realistic synthetic data to address these challenges. Utilizing a vineyard simulator based on the Unity engine, our system employs a cut-and-paste technique with geometrical consistency considerations to produce accurate photo-realistic images and labels from synthetic environments to train detection algorithms. This approach generates diverse data samples across various viewpoints and lighting conditions. We demonstrate considerable performance improvements in training a state-of-the-art detector by applying our method to table grapes cultivation. The combination of techniques can be easily automated, an increasingly important consideration for adoption in agricultural practice.


Ship in Sight: Diffusion Models for Ship-Image Super Resolution

arXiv.org Artificial Intelligence

In recent years, remarkable advancements have been achieved in the field of image generation, primarily driven by the escalating demand for high-quality outcomes across various image generation subtasks, such as inpainting, denoising, and super resolution. A major effort is devoted to exploring the application of super-resolution techniques to enhance the quality of low-resolution images. In this context, our method explores in depth the problem of ship image super resolution, which is crucial for coastal and port surveillance. We investigate the opportunity given by the growing interest in text-to-image diffusion models, taking advantage of the prior knowledge that such foundation models have already learned. In particular, we present a diffusion-model-based architecture that leverages text conditioning during training while being class-aware, to best preserve the crucial details of the ships during the generation of the super-resoluted image. Since the specificity of this task and the scarcity availability of off-the-shelf data, we also introduce a large labeled ship dataset scraped from online ship images, mostly from ShipSpotting\footnote{\url{www.shipspotting.com}} website. Our method achieves more robust results than other deep learning models previously employed for super resolution, as proven by the multiple experiments performed. Moreover, we investigate how this model can benefit downstream tasks, such as classification and object detection, thus emphasizing practical implementation in a real-world scenario. Experimental results show flexibility, reliability, and impressive performance of the proposed framework over state-of-the-art methods for different tasks. The code is available at: https://github.com/LuigiSigillo/ShipinSight .


Proceedings 12th International Workshop on Theorem proving components for Educational software

arXiv.org Artificial Intelligence

The ThEdu series pursues the smooth transition from an intuitive way of doing mathematics at secondary school to a more formal approach to the subject in STEM education, while favouring software support for this transition by exploiting the power of theorem-proving technologies. What follows is a brief description of how the present volume contributes to this enterprise. The 12th International Workshop on Theorem Proving Components for Educational Software(ThEdu'23), was a satellite event of the 29th international Conference on Automated Deduction (CADE 2023), July 1-4, 2023, Rome, Italy. ThEdu'23 was very successful, with one invited talk, by Yves Bertot (Inria, France), "The challenges of using Type Theory to teach Mathematics", and seven regular contributions. An open call for papers was then issued, to which eight contributions were submitted. Seven submissions have been accepted by our reviewers, who jointly produced at least three careful reports on each of the contributions. The resulting revised papers are collected in the present volume. We, the volume editors, hope that this collection of papers will further promote the development of theorem-proving based software, and that it will allow to improve the mutual understanding between computer scientists, mathematicians and stakeholders in education. PC Chairs:Julien Narboux (University of Strasbourg, France); Walther Neuper (JKU, Johannes Kepler University, Linz, Austria); Pedro Quaresma (University of Coimbra, Portugal)


Visual Action Planning with Multiple Heterogeneous Agents

arXiv.org Artificial Intelligence

Visual planning methods are promising to handle complex settings where extracting the system state is challenging. However, none of the existing works tackles the case of multiple heterogeneous agents which are characterized by different capabilities and/or embodiment. In this work, we propose a method to realize visual action planning in multi-agent settings by exploiting a roadmap built in a low-dimensional structured latent space and used for planning. To enable multi-agent settings, we infer possible parallel actions from a dataset composed of tuples associated with individual actions. Next, we evaluate feasibility and cost of them based on the capabilities of the multi-agent system and endow the roadmap with this information, building a capability latent space roadmap (C-LSR). Additionally, a capability suggestion strategy is designed to inform the human operator about possible missing capabilities when no paths are found. The approach is validated in a simulated burger cooking task and a real-world box packing task.


Leveraging Large Language Models for Preliminary Security Risk Analysis: A Mission-Critical Case Study

arXiv.org Artificial Intelligence

Preliminary security risk analysis (PSRA) provides a quick approach to identify, evaluate and propose remeditation to potential risks in specific scenarios. The extensive expertise required for an effective PSRA and the substantial ammount of textual-related tasks hinder quick assessments in mission-critical contexts, where timely and prompt actions are essential. The speed and accuracy of human experts in PSRA significantly impact response time. A large language model can quickly summarise information in less time than a human. To our knowledge, no prior study has explored the capabilities of fine-tuned models (FTM) in PSRA. Our case study investigates the proficiency of FTM to assist practitioners in PSRA. We manually curated 141 representative samples from over 50 mission-critical analyses archived by the industrial context team in the last five years.We compared the proficiency of the FTM versus seven human experts. Within the industrial context, our approach has proven successful in reducing errors in PSRA, hastening security risk detection, and minimizing false positives and negatives. This translates to cost savings for the company by averting unnecessary expenses associated with implementing unwarranted countermeasures. Therefore, experts can focus on more comprehensive risk analysis, leveraging LLMs for an effective preliminary assessment within a condensed timeframe.


Generalizing Medical Image Representations via Quaternion Wavelet Networks

arXiv.org Artificial Intelligence

Neural network generalizability is becoming a broad research field due to the increasing availability of datasets from different sources and for various tasks. This issue is even wider when processing medical data, where a lack of methodological standards causes large variations being provided by different imaging centers or acquired with various devices and cofactors. To overcome these limitations, we introduce a novel, generalizable, data- and task-agnostic framework able to extract salient features from medical images. The proposed quaternion wavelet network (QUAVE) can be easily integrated with any pre-existing medical image analysis or synthesis task, and it can be involved with real, quaternion, or hypercomplex-valued models, generalizing their adoption to single-channel data. QUAVE first extracts different sub-bands through the quaternion wavelet transform, resulting in both low-frequency/approximation bands and high-frequency/fine-grained features. Then, it weighs the most representative set of sub-bands to be involved as input to any other neural model for image processing, replacing standard data samples. We conduct an extensive experimental evaluation comprising different datasets, diverse image analysis, and synthesis tasks including reconstruction, segmentation, and modality translation. We also evaluate QUAVE in combination with both real and quaternion-valued models. Results demonstrate the effectiveness and the generalizability of the proposed framework that improves network performance while being flexible to be adopted in manifold scenarios and robust to domain shifts. The full code is available at: https://github.com/ispamm/QWT.


Evaluating Gesture Recognition in Virtual Reality

arXiv.org Artificial Intelligence

Human-Robot Interaction (HRI) has become increasingly important as robots are being integrated into various aspects of daily life. One key aspect of HRI is gesture recognition, which allows robots to interpret and respond to human gestures in real-time. Gesture recognition plays an important role in non-verbal communication in HRI. To this aim, there is ongoing research on how such non-verbal communication can strengthen verbal communication and improve the system's overall efficiency, thereby enhancing the user experience with the robot. However, several challenges need to be addressed in gesture recognition systems, which include data generation, transferability, scalability, generalizability, standardization, and lack of benchmarking of the gestural systems. In this preliminary paper, we want to address the challenges of data generation using virtual reality simulations and standardization issues by presenting gestures to some commands that can be used as a standard in ground robots.


Natural Language based Context Modeling and Reasoning for Ubiquitous Computing with Large Language Models: A Tutorial

arXiv.org Artificial Intelligence

Large language models (LLMs) have become phenomenally surging, since 2018--two decades after introducing context-awareness into computing systems. Through taking into account the situations of ubiquitous devices, users and the societies, context-aware computing has enabled a wide spectrum of innovative applications, such as assisted living, location-based social network services and so on. To recognize contexts and make decisions for actions accordingly, various artificial intelligence technologies, such as Ontology and OWL, have been adopted as representations for context modeling and reasoning. Recently, with the rise of LLMs and their improved natural language understanding and reasoning capabilities, it has become feasible to model contexts using natural language and perform context reasoning by interacting with LLMs such as ChatGPT and GPT-4. In this tutorial, we demonstrate the use of texts, prompts, and autonomous agents (AutoAgents) that enable LLMs to perform context modeling and reasoning without requiring fine-tuning of the model. We organize and introduce works in the related field, and name this computing paradigm as the LLM-driven Context-aware Computing (LCaC). In the LCaC paradigm, users' requests, sensors reading data, and the command to actuators are supposed to be represented as texts. Given the text of users' request and sensor data, the AutoAgent models the context by prompting and sends to the LLM for context reasoning. LLM generates a plan of actions and responds to the AutoAgent, which later follows the action plan to foster context-awareness. To prove the concepts, we use two showcases--(1) operating a mobile z-arm in an apartment for assisted living, and (2) planning a trip and scheduling the itinerary in a context-aware and personalized manner.


AI in an ancient city: Can technology help you on your European vacation?

NBC News Top Stories

"When in Rome, do as the Romans do," the proverb says. But what if you're only in the Italian capital for just one day and you're keen to fit in as much of its history and culture as possible? Sure, you could take a few hours out to plan your trip or you could try to book a tour guide to take you round "The Eternal City." But now there's a third option: Tourism apps, websites and chatbots that use artificial intelligence to tailor itineraries for the user based on their preferences and time. They're rapidly popping up, so NBC News decided to put three of them to the test.


Spectral indices in remote sensing- part-1

#artificialintelligence

Spectral Indices (SIs) are mathematical equations applied to each pixel image to highlight a specific phenomenon on the ground. Most SIs are computed from the reflectance data produced after some pre-processing stages of multispectral remote sensing images. In which bx and by are the reflectance values of a pixel in bands x and y. If we calculate the value of a SI for each pixel, we can generate an image from SI. In this post, I want to talk about the two most important spectral indices and how to calculate them for a case study in the center of Rome, Italy, using the Sentinel-hub cloud platform.