Goto

Collaborating Authors

 Sarajevo


Illegal Waste Detection in Remote Sensing Images: A Case Study

Gibellini, Federico, Fraternali, Piero, Boracchi, Giacomo, Morandini, Luca, Diecidue, Andrea, Malegori, Simona

arXiv.org Artificial Intelligence

Environmental crime currently represents the third largest criminal activity worldwide while threatening ecosystems as well as human health. Among the crimes related to this activity, improper waste management can nowadays be countered more easily thanks to the increasing availability and decreasing cost of Very-High-Resolution Remote Sensing images, which enable semi-automatic territory scanning in search of illegal landfills. This paper proposes a pipeline, developed in collaboration with professionals from a local environmental agency, for detecting candidate illegal dumping sites leveraging a classifier of Remote Sensing images. To identify the best configuration for such classifier, an extensive set of experiments was conducted and the impact of diverse image characteristics and training settings was thoroughly analyzed. The local environmental agency was then involved in an experimental exercise where outputs from the developed classifier were integrated in the experts' everyday work, resulting in time savings with respect to manual photo-interpretation. The classifier was eventually run with valuable results on a location outside of the training area, highlighting potential for cross-border applicability of the proposed pipeline.


Deep Learning Models for Physical Layer Communications

Letizia, Nunzio A.

arXiv.org Artificial Intelligence

The increased availability of data and computing resources has enabled researchers to successfully adopt machine learning (ML) techniques and make significant contributions in several engineering areas. ML and in particular deep learning (DL) algorithms have shown to perform better in tasks where a physical bottom-up description of the phenomenon is lacking and/or is mathematically intractable. Indeed, they take advantage of the observations of natural phenomena to automatically acquire knowledge and learn internal relations. Despite the historical model-based mindset, communications engineering recently started shifting the focus towards top-down data-driven learning models, especially in domains such as channel modeling and physical layer design, where in most of the cases no general optimal strategies are known. In this thesis, we aim at solving some fundamental open challenges in physical layer communications exploiting new DL paradigms. In particular, we mathematically formulate, under ML terms, classic problems such as channel capacity and optimal coding-decoding schemes, for any arbitrary communication medium. We design and develop the architecture, algorithm and code necessary to train the equivalent DL model, and finally, we propose novel solutions to long-standing problems in the field.


Diffusion Instruction Tuning

Jin, Chen, Tanno, Ryutaro, Saseendran, Amrutha, Diethe, Tom, Teare, Philip

arXiv.org Artificial Intelligence

We introduce Lavender, a simple supervised fine-tuning (SFT) method that boosts the performance of advanced vision-language models (VLMs) by leveraging state-of-the-art image generation models such as Stable Diffusion. Specifically, Lavender aligns the text-vision attention in the VLM transformer with the equivalent used by Stable Diffusion during SFT, instead of adapting separate encoders. This alignment enriches the model's visual understanding and significantly boosts performance across in- and out-of-distribution tasks. Lavender requires just 0.13 million training examples, 2.5% of typical large-scale SFT datasets, and fine-tunes on standard hardware (8 GPUs) in a single day. It consistently improves state-of-the-art open-source multimodal LLMs (e.g., Llama-3.2-11B, MiniCPM-Llama3-v2.5), achieving up to 30% gains and a 68% boost on challenging out-of-distribution medical QA tasks. By efficiently transferring the visual expertise of image generators with minimal supervision, Lavender offers a scalable solution for more accurate vision-language systems. All code, training data, and models will be shared at https://astrazeneca.github.io/vlm/.


Real-Time Sampling-Based Safe Motion Planning for Robotic Manipulators in Dynamic Environments

Covic, Nermin, Lacevic, Bakir, Osmankovic, Dinko, Uzunovic, Tarik

arXiv.org Artificial Intelligence

In this paper, we present the main features of Dynamic Rapidly-exploring Generalized Bur Tree (DRGBT) algorithm, a sampling-based planner for dynamic environments. We provide a detailed time analysis and appropriate scheduling to facilitate a real-time operation. To this end, an extensive analysis is conducted to identify the time-critical routines and their dependence on the number of obstacles. Furthermore, information about the distance to obstacles is used to compute a structure called dynamic expanded bubble of free configuration space, which is then utilized to establish sufficient conditions for a guaranteed safe motion of the robot while satisfying all kinematic constraints. An extensive randomized simulation trial is conducted to compare the proposed algorithm to a competing state-of-the-art method. Finally, an experimental study on a real robot is carried out covering a variety of scenarios including those with human presence. The results show the effectiveness and feasibility of real-time execution of the proposed motion planning algorithm within a typical sensor-based arrangement, using cheap hardware and sequential architecture, without the necessity for GPUs or heavy parallelization.


Knowledge Distillation for Real-Time Classification of Early Media in Voice Communications

Altwlkany, Kemal, Hadžić, Hadžem, Kurić, Amar, Lacic, Emanuel

arXiv.org Artificial Intelligence

This paper investigates the industrial setting of real-time classification of early media exchanged during the initialization phase of voice calls. We explore the application of state-of-the-art audio tagging models and highlight some limitations when applied to the classification of early media. While most existing approaches leverage convolutional neural networks, we propose a novel approach for low-resource requirements based on gradient-boosted trees. Our approach not only demonstrates a substantial improvement in runtime performance, but also exhibits a comparable accuracy. We show that leveraging knowledge distillation and class aggregation techniques to train a simpler and smaller model accelerates the classification of early media in voice calls. We provide a detailed analysis of the results on a proprietary and publicly available dataset, regarding accuracy and runtime performance. We additionally report a case study of the achieved performance improvements at a regional data center in India.


Towards Probabilistic Planning of Explanations for Robot Navigation

Halilovic, Amar, Krivic, Senka

arXiv.org Artificial Intelligence

In robotics, ensuring that autonomous systems are comprehensible and accountable to users is essential for effective human-robot interaction. This paper introduces a novel approach that integrates user-centered design principles directly into the core of robot path planning processes. We propose a probabilistic framework for automated planning of explanations for robot navigation, where the preferences of different users regarding explanations are probabilistically modeled to tailor the stochasticity of the real-world human-robot interaction and the communication of decisions of the robot and its actions towards humans. This approach aims to enhance the transparency of robot path planning and adapt to diverse user explanation needs by anticipating the types of explanations that will satisfy individual users.


News-Driven Stock Price Forecasting in Indian Markets: A Comparative Study of Advanced Deep Learning Models

Attaluri, Kaushal, Tripathi, Mukesh, Reddy, Srinithi, Shivendra, null

arXiv.org Artificial Intelligence

Forecasting stock market prices remains a complex challenge for traders, analysts, and engineers due to the multitude of factors that influence price movements. Recent advancements in artificial intelligence (AI) and natural language processing (NLP) have significantly enhanced stock price prediction capabilities. AI's ability to process vast and intricate data sets has led to more sophisticated forecasts. However, achieving consistently high accuracy in stock price forecasting remains elusive. In this paper, we leverage 30 years of historical data from national banks in India, sourced from the National Stock Exchange, to forecast stock prices. Our approach utilizes state-of-the-art deep learning models, including multivariate multi-step Long Short-Term Memory (LSTM), Facebook Prophet with LightGBM optimized through Optuna, and Seasonal Auto-Regressive Integrated Moving Average (SARIMA). We further integrate sentiment analysis from tweets and reliable financial sources such as Business Standard and Reuters, acknowledging their crucial influence on stock price fluctuations.


A Recurrent Neural Network Approach to the Answering Machine Detection Problem

Altwlkany, Kemal, Delalic, Sead, Selmanovic, Elmedin, Alihodzic, Adis, Lovric, Ivica

arXiv.org Artificial Intelligence

In the field of telecommunications and cloud communications, accurately and in real-time detecting whether a human or an answering machine has answered an outbound call is of paramount importance. This problem is of particular significance during campaigns as it enhances service quality, efficiency and cost reduction through precise caller identification. Despite the significance of the field, it remains inadequately explored in the existing literature. This paper presents an innovative approach to answering machine detection that leverages transfer learning through the YAMNet model for feature extraction. The YAMNet architecture facilitates the training of a recurrent-based classifier, enabling real-time processing of audio streams, as opposed to fixed-length recordings. The results demonstrate an accuracy of over 96% on the test set. Furthermore, we conduct an in-depth analysis of misclassified samples and reveal that an accuracy exceeding 98% can be achieved with the integration of a silence detection algorithm, such as the one provided by FFmpeg.


Multi-Class Plant Leaf Disease Detection: A CNN-based Approach with Mobile App Integration

Foysal, Md Aziz Hosen, Ahmed, Foyez, Haque, Md Zahurul

arXiv.org Artificial Intelligence

Prompt and accurate detection is crucial for the efficient management and mitigation of plant diseases. This study investigates advanced techniques in plant disease detection, emphasizing the integration of image processing, machine learning, deep learning methods, and mobile technologies. High-resolution images of plant leaves were captured and analyzed using convolutional neural networks (CNNs) to detect symptoms of various diseases, such as blight, mildew, and rust. This study explores 14 classes of plants and diagnoses 26 unique plant diseases. We focus on common diseases affecting various crops. The model was trained on a diverse dataset encompassing multiple crops and disease types, achieving 98.14% accuracy in disease diagnosis. Finally integrated this model into mobile apps for real-time disease diagnosis.


The Language of Trauma: Modeling Traumatic Event Descriptions Across Domains with Explainable AI

Schirmer, Miriam, Leemann, Tobias, Kasneci, Gjergji, Pfeffer, Jürgen, Jurgens, David

arXiv.org Artificial Intelligence

Psychological trauma can manifest following various distressing events and is captured in diverse online contexts. However, studies traditionally focus on a single aspect of trauma, often neglecting the transferability of findings across different scenarios. We address this gap by training language models with progressing complexity on trauma-related datasets, including genocide-related court data, a Reddit dataset on post-traumatic stress disorder (PTSD), counseling conversations, and Incel forum posts. Our results show that the fine-tuned RoBERTa model excels in predicting traumatic events across domains, slightly outperforming large language models like GPT-4. Additionally, SLALOM-feature scores and conceptual explanations effectively differentiate and cluster trauma-related language, highlighting different trauma aspects and identifying sexual abuse and experiences related to death as a common traumatic event across all datasets. This transferability is crucial as it allows for the development of tools to enhance trauma detection and intervention in diverse populations and settings.