Goto

Collaborating Authors

 intensity value


General Intelligence-based Fragmentation (GIF): A framework for peak-labeled spectra simulation

Martin, Margaret R., Hassoun, Soha

arXiv.org Artificial Intelligence

Despite growing reference libraries and advanced computational tools, progress in the field of metabolomics remains constrained by low rates of annotating measured spectra. The recent developments of large language models (LLMs) have led to strong performance across a wide range of generation and reasoning tasks, spurring increased interest in LLMs' application to domain-specific scientific challenges, such as mass spectra annotation. Here, we present a novel framework, General Intelligence-based Fragmentation (GIF), that guides pretrained LLMs through spectra simulation using structured prompting and reasoning. GIF utilizes tagging, structured inputs/outputs, system prompts, instruction-based prompts, and iterative refinement. Indeed, GIF offers a structured alternative to ad hoc prompting, underscoring the need for systematic guidance of LLMs on complex scientific tasks. Using GIF, we evaluate current generalist LLMs' ability to use reasoning towards fragmentation and to perform intensity prediction after fine-tuning. We benchmark performance on a novel QA dataset, the MassSpecGym QA-sim dataset, that we derive from the MassSpecGym dataset. Through these implementations of GIF, we find that GPT-4o and GPT-4o-mini achieve a cosine similarity of 0.36 and 0.35 between the simulated and true spectra, respectively, outperforming other pretrained models including GPT-5, Llama-3.1, and ChemDFM, despite GPT-5's recency and ChemDFM's domain specialization. GIF outperforms several deep learning baselines. Our evaluation of GIF highlights the value of using LLMs not only for spectra simulation but for enabling human-in-the-loop workflows and structured, explainable reasoning in molecular fragmentation.


Multi-Analyte, Swab-based Automated Wound Monitor with AI

Sikha, Madhu Babu, Appari, Lalith, Ganesh, Gurudatt Nanjanagudu, Bandodkar, Amay, Banerjee, Imon

arXiv.org Artificial Intelligence

Diabetic foot ulcers (DFUs), a class of chronic wounds, affect ~750,000 individuals every year in the US alone and identifying non-healing DFUs that develop to chronic wounds early can drastically reduce treatment costs and minimize risks of amputation. There is therefore a pressing need for diagnostic tools that can detect non-healing DFUs early. We develop a low cost, multi-analyte 3D printed assays seamlessly integrated on swabs that can identify non-healing DFUs and a Wound Sensor iOS App - an innovative mobile application developed for the controlled acquisition and automated analysis of wound sensor data. By comparing both the original base image (before exposure to the wound) and the wound-exposed image, we developed automated computer vision techniques to compare density changes between the two assay images, which allow us to automatically determine the severity of the wound. The iOS app ensures accurate data collection and presents actionable insights, despite challenges such as variations in camera configurations and ambient conditions. The proposed integrated sensor and iOS app will allow healthcare professionals to monitor wound conditions real-time, track healing progress, and assess critical parameters related to wound care.


MinkUNeXt-SI: Improving point cloud-based place recognition including spherical coordinates and LiDAR intensity

Vilella-Cantos, Judith, Cabrera, Juan José, Payá, Luis, Ballesta, Mónica, Valiente, David

arXiv.org Artificial Intelligence

In autonomous navigation systems, the solution of the place recognition problem is crucial for their safe functioning. But this is not a trivial solution, since it must be accurate regardless of any changes in the scene, such as seasonal changes and different weather conditions, and it must be generalizable to other environments. This paper presents our method, MinkUNeXt-SI, which, starting from a LiDAR point cloud, preprocesses the input data to obtain its spherical coordinates and intensity values normalized within a range of 0 to 1 for each point, and it produces a robust place recognition descriptor. To that end, a deep learning approach that combines Minkowski convolutions and a U-net architecture with skip connections is used. The results of MinkUNeXt-SI demonstrate that this method reaches and surpasses state-of-the-art performance while it also generalizes satisfactorily to other datasets. Additionally, we showcase the capture of a custom dataset and its use in evaluating our solution, which also achieves outstanding results. Both the code of our solution and the runs of our dataset are publicly available for reproducibility purposes.


KEVS: Enhancing Segmentation of Visceral Adipose Tissue in Pre-Cystectomy CT with Gaussian Kernel Density Estimation

Boucher, Thomas, Tetlow, Nicholas, Fung, Annie, Dewar, Amy, Arina, Pietro, Kerneis, Sven, Whittle, John, Mazomenos, Evangelos B.

arXiv.org Artificial Intelligence

Purpose: The distribution of visceral adipose tissue (VAT) in cystectomy patients is indicative of the incidence of post-operative complications. Existing VAT segmentation methods for computed tomography (CT) employing intensity thresholding have limitations relating to inter-observer variability. Moreover, the difficulty in creating ground-truth masks limits the development of deep learning (DL) models for this task. This paper introduces a novel method for VAT prediction in pre-cystectomy CT, which is fully automated and does not require ground-truth VAT masks for training, overcoming aforementioned limitations. Methods: We introduce the Kernel density Enhanced VAT Segmentator ( KEVS), combining a DL semantic segmentation model, for multi-body feature prediction, with Gaussian kernel density estimation analysis of predicted subcutaneous adipose tissue to achieve accurate scan-specific predictions of VAT in the abdominal cavity. Uniquely for a DL pipeline, KEVS does not require ground-truth VAT masks. Results: We verify the ability of KEVS to accurately segment abdominal organs in unseen CT data and compare KEVS VAT segmentation predictions to existing state-of-the-art (SOTA) approaches in a dataset of 20 pre-cystectomy CT scans, collected from University College London Hospital (UCLH-Cyst), with expert ground-truth annotations. KEVS presents a 4.80% and 6.02% improvement in Dice Coefficient over the second best DL and thresholding-based VAT segmentation techniques respectively when evaluated on UCLH-Cyst. Conclusion: This research introduces KEVS; an automated, SOTA method for the prediction of VAT in pre-cystectomy CT which eliminates inter-observer variability and is trained entirely on open-source CT datasets which do not contain ground-truth VAT masks.


Synth It Like KITTI: Synthetic Data Generation for Object Detection in Driving Scenarios

Marcus, Richard, Vogel, Christian, Jatzkowski, Inga, Knoop, Niklas, Stamminger, Marc

arXiv.org Artificial Intelligence

An important factor in advancing autonomous driving systems is simulation. Yet, there is rather small progress for transferability between the virtual and real world. We revisit this problem for 3D object detection on LiDAR point clouds and propose a dataset generation pipeline based on the CARLA simulator. Utilizing domain randomization strategies and careful modeling, we are able to train an object detector on the synthetic data and demonstrate strong generalization capabilities to the KITTI dataset. Furthermore, we compare different virtual sensor variants to gather insights, which sensor attributes can be responsible for the prevalent domain gap. Finally, fine-tuning with a small portion of real data almost matches the baseline and with the full training set slightly surpasses it.


Edge Attention Module for Object Classification

Roy, Santanu, Suresh, Ashvath, Gupta, Archit

arXiv.org Artificial Intelligence

A novel ``edge attention-based Convolutional Neural Network (CNN)'' is proposed in this research for object classification task. With the advent of advanced computing technology, CNN models have achieved to remarkable success, particularly in computer vision applications. Nevertheless, the efficacy of the conventional CNN is often hindered due to class imbalance and inter-class similarity problems, which are particularly prominent in the computer vision field. In this research, we introduce for the first time an ``Edge Attention Module (EAM)'' consisting of a Max-Min pooling layer, followed by convolutional layers. This Max-Min pooling is entirely a novel pooling technique, specifically designed to capture only the edge information that is crucial for any object classification task. Therefore, by integrating this novel pooling technique into the attention module, the CNN network inherently prioritizes on essential edge features, thereby boosting the accuracy and F1-score of the model significantly. We have implemented our proposed EAM or 2EAMs on several standard pre-trained CNN models for Caltech-101, Caltech-256, CIFAR-100 and Tiny ImageNet-200 datasets. The extensive experiments reveal that our proposed framework (that is, EAM with CNN and 2EAMs with CNN), outperforms all pre-trained CNN models as well as recent trend models ``Pooling-based Vision Transformer (PiT)'', ``Convolutional Block Attention Module (CBAM)'', and ConvNext, by substantial margins. We have achieved the accuracy of 95.5% and 86% by the proposed framework on Caltech-101 and Caltech-256 datasets, respectively. So far, this is the best results on these datasets, to the best of our knowledge.


Generative Precipitation Downscaling using Score-based Diffusion with Wasserstein Regularization

Liu, Yuhao, Doss-Gollin, James, Balakrishnan, Guha, Veeraraghavan, Ashok

arXiv.org Artificial Intelligence

Understanding local risks from extreme rainfall, such as flooding, requires both long records (to sample rare events) and high-resolution products (to assess localized hazards). Unfortunately, there is a dearth of long-record and high-resolution products that can be used to understand local risk and precipitation science. In this paper, we present a novel generative diffusion model that downscales (super-resolves) globally available Climate Prediction Center (CPC) gauge-based precipitation products and ERA5 reanalysis data to generate kilometer-scale precipitation estimates. Downscaling gauge-based precipitation from 55 km to 1 km while recovering extreme rainfall signals poses significant challenges. To enforce our model (named WassDiff) to produce well-calibrated precipitation intensity values, we introduce a Wasserstein Distance Regularization (WDR) term for the score-matching training objective in the diffusion denoising process. We show that WDR greatly enhances the model's ability to capture extreme values compared to diffusion without WDR. Extensive evaluation shows that WassDiff has better reconstruction accuracy and bias scores than conventional score-based diffusion models. Case studies of extreme weather phenomena, like tropical storms and cold fronts, demonstrate WassDiff's ability to produce appropriate spatial patterns while capturing extremes. Such downscaling capability enables the generation of extensive km-scale precipitation datasets from existing historical global gauge records and current gauge measurements in areas without high-resolution radar.


An Optimized Toolbox for Advanced Image Processing with Tsetlin Machine Composites

Grønningsæter, Ylva, Smørvik, Halvor S., Granmo, Ole-Christoffer

arXiv.org Artificial Intelligence

The Tsetlin Machine (TM) has achieved competitive results on several image classification benchmarks, including MNIST, K-MNIST, F-MNIST, and CIFAR-2. However, color image classification is arguably still in its infancy for TMs, with CIFAR-10 being a focal point for tracking progress. Over the past few years, TM's CIFAR-10 accuracy has increased from around 61% in 2020 to 75.1% in 2023 with the introduction of Drop Clause. In this paper, we leverage the recently proposed TM Composites architecture and introduce a range of TM Specialists that use various image processing techniques. These include Canny edge detection, Histogram of Oriented Gradients, adaptive mean thresholding, adaptive Gaussian thresholding, Otsu's thresholding, color thermometers, and adaptive color thermometers. In addition, we conduct a rigorous hyperparameter search, where we uncover optimal hyperparameters for several of the TM Specialists. The result is a toolbox that provides new state-of-the-art results on CIFAR-10 for TMs with an accuracy of 82.8%. In conclusion, our toolbox of TM Specialists forms a foundation for new TM applications and a landmark for further research on TM Composites in image analysis.


Toward Physics-Aware Deep Learning Architectures for LiDAR Intensity Simulation

Anand, Vivek, Lohani, Bharat, Pandey, Gaurav, Mishra, Rakesh

arXiv.org Artificial Intelligence

Autonomous vehicles (AVs) heavily rely on LiDAR perception for environment understanding and navigation. LiDAR intensity provides valuable information about the reflected laser signals and plays a crucial role in enhancing the perception capabilities of AVs. However, accurately simulating LiDAR intensity remains a challenge due to the unavailability of material properties of the objects in the environment, and complex interactions between the laser beam and the environment. The proposed method aims to improve the accuracy of intensity simulation by incorporating physics-based modalities within the deep learning framework. One of the key entities that captures the interaction between the laser beam and the objects is the angle of incidence. In this work we demonstrate that the addition of the LiDAR incidence angle as a separate input to the deep neural networks significantly enhances the results. We present a comparative study between two prominent deep learning architectures: U-NET a Convolutional Neural Network (CNN), and Pix2Pix a Generative Adversarial Network (GAN). We implemented these two architectures for the intensity prediction task and used SemanticKITTI and VoxelScape datasets for experiments. The comparative analysis reveals that both architectures benefit from the incidence angle as an additional input. Moreover, the Pix2Pix architecture outperforms U-NET, especially when the incidence angle is incorporated.


Reflectivity Is All You Need!: Advancing LiDAR Semantic Segmentation

Viswanath, Kasi, Jiang, Peng, Saripalli, Srikanth

arXiv.org Artificial Intelligence

LiDAR semantic segmentation frameworks predominantly leverage geometry-based features to differentiate objects within a scan. While these methods excel in scenarios with clear boundaries and distinct shapes, their performance declines in environments where boundaries are blurred, particularly in off-road contexts. To address this, recent strides in 3D segmentation algorithms have focused on harnessing raw LiDAR intensity measurements to improve prediction accuracy. Despite these efforts, current learning-based models struggle to correlate the intricate connections between raw intensity and factors such as distance, incidence angle, material reflectivity, and atmospheric conditions. Building upon our prior work, this paper delves into the advantages of employing calibrated intensity (also referred to as reflectivity) within learning-based LiDAR semantic segmentation frameworks. We initially establish that incorporating reflectivity as an input enhances the existing LiDAR semantic segmentation model. Furthermore, we present findings that enable the model to learn to calibrate intensity can boost its performance. Through extensive experimentation on the off-road dataset Rellis-3D, we demonstrate notable improvements. Specifically, converting intensity to reflectivity results in a 4% increase in mean Intersection over Union (mIoU) when compared to using raw intensity in Off-road scenarios. Additionally, we also investigate the possible benefits of using calibrated intensity in semantic segmentation in urban environments (SemanticKITTI) and cross-sensor domain adaptation.