Goto

Collaborating Authors

 South America


Image registration is a geometric deep learning task

arXiv.org Artificial Intelligence

Data-driven deformable image registration methods predominantly rely on operations that process grid-like inputs. However, applying deformable transformations to an image results in a warped space that deviates from a rigid grid structure. Consequently, data-driven approaches with sequential deformations have to apply grid resampling operations between each deformation step. While artifacts caused by resampling are negligible in high-resolution images, the resampling of sparse, high-dimensional feature grids introduces errors that affect the deformation modeling process. Taking inspiration from Lagrangian reference frames of deformation fields, our work introduces a novel paradigm for data-driven deformable image registration that utilizes geometric deep-learning principles to model deformations without grid requirements. Specifically, we model image features as a set of nodes that freely move in Euclidean space, update their coordinates under graph operations, and dynamically readjust their local neighborhoods. We employ this formulation to construct a multi-resolution deformable registration model, where deformation layers iteratively refine the overall transformation at each resolution without intermediate resampling operations on the feature grids. We investigate our method's ability to fully deformably capture large deformations across a number of medical imaging registration tasks. In particular, we apply our approach (GeoReg) to the registration of inter-subject brain MR images and inhale-exhale lung CT images, showing on par performance with the current state-of-the-art methods. We believe our contribution open up avenues of research to reduce the black-box nature of current learned registration paradigms by explicitly modeling the transformation within the architecture.


Concurrent vertical and horizontal federated learning with fuzzy cognitive maps

arXiv.org Artificial Intelligence

Federated learning (FL) is an emerging distributed artificial In the context of Fuzzy Cognitive Maps (FCMs), federated intelligence framework that enables privacy-preserving machine learning (FL) is employed to address several intrinsic challenges learning by synthesizing local models instead of sharing associated with these models. FL offers an effective actual data [1]. The general fundamental process can be outlined approach to managing these challenges, enhancing the performance as follows [2]: the federation process is initiated by a and applicability of FCMs. One necessary issue in the single server or participant who provides an initial model for FCMs is the decentralised nature of data sources and the need individual participants to train using their local data. These to preserve data privacy and security while enabling collaborative participants then share the model's weights or gradients with model development. FCM models frequently rely on the server (or other participants) for aggregation, typically using data distributed across multiple locations or organisations.


Stiefel Flow Matching for Moment-Constrained Structure Elucidation

arXiv.org Artificial Intelligence

Molecular structure elucidation is a fundamental step in understanding chemical phenomena, with applications in identifying molecules in natural products, lab syntheses, forensic samples, and the interstellar medium. We consider the task of predicting a molecule's all-atom 3D structure given only its molecular formula and moments of inertia, motivated by the ability of rotational spectroscopy to measure these moments. While existing generative models can conditionally sample 3D structures with approximately correct moments, this soft conditioning fails to leverage the many digits of precision afforded by experimental rotational spectroscopy. To address this, we first show that the space of n-atom point clouds with a fixed set of moments of inertia is embedded in the Stiefel manifold St(n, 4). We then propose Stiefel Flow Matching as a generative model for elucidating 3D structure under exact moment constraints. Additionally, we learn simpler and shorter flows by finding approximate solutions for equivariant optimal transport on the Stiefel manifold. Empirically, enforcing exact moment constraints allows Stiefel Flow Matching to achieve higher success rates and faster sampling than Euclidean diffusion models, even on high-dimensional manifolds corresponding to large molecules in the GEOM dataset. Elucidating the structure of unknown molecules is a central task in chemistry, important for analyzing environmental samples (Moneta et al., 2023), identifying novel drugs (Sonstrom et al., 2023), and determining potential building blocks of life in the interstellar medium (McGuire et al., 2016). The challenge is to aggregate information from multiple sources of analytical data to unambiguously determine a molecule's structure. Rotational spectroscopy holds a unique capacity to provide precise measurements of a molecule's rotational constants, which are closely related to its moments of inertia. In turn, the connection between these moments and 3D structure has routinely provided the highest quality gas-phase 3D structures attainable from experiment (Domingos et al., 2020). Typically, structure elucidation with rotational spectroscopy proceeds by confirming whether a known structure's moments match with experiment (Lee & McCarthy, 2019; McCarthy et al., 2020). However, this approach is inherently restricted to molecules whose structures have already been catalogued, and leaves no prescription for undiscovered molecules such as novel natural products and key reactive intermediate species that cannot be easily isolated (Womack et al., 2015).


Automated Phytosensing: Ozone Exposure Classification Based on Plant Electrical Signals

arXiv.org Artificial Intelligence

In our project WatchPlant, we propose to use a decentralized network of living plants as air-quality sensors by measuring their electrophysiology to infer the environmental state, also called phytosensing. We conducted in-lab experiments exposing ivy (Hedera helix) plants to ozone, an important pollutant to monitor, and measured their electrophysiological response. However, there is no well established automated way of detecting ozone exposure in plants. We propose a generic automatic toolchain to select a high-performance subset of features and highly accurate models for plant electrophysiology. Our approach derives plant- and stimulus-generic features from the electrophysiological signal using the tsfresh library. Based on these features, we automatically select and optimize machine learning models using AutoML. We use forward feature selection to increase model performance. We show that our approach successfully classifies plant ozone exposure with accuracies of up to 94.6% on unseen data. We also show that our approach can be used for other plant species and stimuli. Our toolchain automates the development of monitoring algorithms for plants as pollutant monitors. Our results help implement significant advancements for phytosensing devices contributing to the development of cost-effective, high-density urban air monitoring systems in the future.


WHAT-IF: Exploring Branching Narratives by Meta-Prompting Large Language Models

arXiv.org Artificial Intelligence

WHAT-IF--Writing a Hero's Alternate Timeline through Interactive Fiction--is a system that uses zero-shot meta-prompting to create branching narratives from a prewritten story. Played as an interactive fiction (IF) game, WHAT-IF lets the player choose between decisions that the large language model (LLM) GPT-4 generates as possible branches in the story. Starting with an existing linear plot as input, a branch is created at each key decision taken by the main character. By meta-prompting the LLM to consider the major plot points from the story, the system produces coherent and well-structured alternate storylines. WHAT-IF stores the branching plot tree in a graph which helps it to both keep track of the story for prompting and maintain the structure for the final IF system. Figure 1: The WHAT-IF user interface, filled with the A video demo of our system can be found here: main character, title, and the plot of the TV show WandaVision https://youtu.be/8vBqjqtupcc.


Uchaguzi-2022: A Dataset of Citizen Reports on the 2022 Kenyan Election

arXiv.org Artificial Intelligence

Online reporting platforms have enabled citizens around the world to collectively share their opinions and report in real time on events impacting their local communities. Systematically organizing (e.g., categorizing by attributes) and geotagging large amounts of crowdsourced information is crucial to ensuring that accurate and meaningful insights can be drawn from this data and used by policy makers to bring about positive change. These tasks, however, typically require extensive manual annotation efforts. In this paper we present Uchaguzi-2022, a dataset of 14k categorized and geotagged citizen reports related to the 2022 Kenyan General Election containing mentions of election-related issues such as official misconduct, vote count irregularities, and acts of violence. We use this dataset to investigate whether language models can assist in scalably categorizing and geotagging reports, thus highlighting its potential application in the AI for Social Good space.


License Plate Detection and Character Recognition Using Deep Learning and Font Evaluation

arXiv.org Artificial Intelligence

License plate detection (LPD) is essential for traffic management, vehicle tracking, and law enforcement but faces challenges like variable lighting and diverse font types, impacting accuracy. Traditionally reliant on image processing and machine learning, the field is now shifting towards deep learning for its robust performance in various conditions. Current methods, however, often require tailoring to specific regional datasets. This paper proposes a dual deep learning strategy using a Faster R-CNN for detection and a CNN-RNN model with Connectionist Temporal Classification (CTC) loss and a MobileNet V3 backbone for recognition. This approach aims to improve model performance using datasets from Ontario, Quebec, California, and New York State, achieving a recall rate of 92% on the Centre for Pattern Recognition and Machine Intelligence (CENPARMI) dataset and 90% on the UFPR-ALPR dataset. It includes a detailed error analysis to identify the causes of false positives. Additionally, the research examines the role of font features in license plate (LP) recognition, analyzing fonts like Driver Gothic, Dreadnought, California Clarendon, and Zurich Extra Condensed with the OpenALPR system. It discovers significant performance discrepancies influenced by font characteristics, offering insights for future LPD system enhancements.


The Reliability Paradox: Exploring How Shortcut Learning Undermines Language Model Calibration

arXiv.org Artificial Intelligence

The advent of pre-trained language models (PLMs) has enabled significant performance gains in the field of natural language processing. However, recent studies have found PLMs to suffer from miscalibration, indicating a lack of accuracy in the confidence estimates provided by these models. Current evaluation methods for PLM calibration often assume that lower calibration error estimates indicate more reliable predictions. However, fine-tuned PLMs often resort to shortcuts, leading to overconfident predictions that create the illusion of enhanced performance but lack generalizability in their decision rules. The relationship between PLM reliability, as measured by calibration error, and shortcut learning, has not been thoroughly explored thus far. This paper aims to investigate this relationship, studying whether lower calibration error implies reliable decision rules for a language model. Our findings reveal that models with seemingly superior calibration portray higher levels of non-generalizable decision rules. This challenges the prevailing notion that well-calibrated models are inherently reliable. Our study highlights the need to bridge the current gap between language model calibration and generalization objectives, urging the development of comprehensive frameworks to achieve truly robust and reliable language models.


Big Tech Will Scour the Globe in Its Search for Cheap Energy

WIRED

On the southern tip of Malaysia lies the state of Johor, renowned for its beaches and mountainous jungle. But Johor has a new boom industry: data centers to power generative AI, with Microsoft committing more than 2 billion on just such a data center. For the tech giants, electricity has become the new oil. A state-of-the-art AI data center might need 90 MW, enough to power tens of thousands of American homes. With AI applications proliferating, from chatbots to AI agents, needs are growing.


Graph Learning in the Era of LLMs: A Survey from the Perspective of Data, Models, and Tasks

arXiv.org Artificial Intelligence

With the increasing prevalence of cross-domain Text-Attributed Graph (TAG) Data (e.g., citation networks, recommendation systems, social networks, and ai4science), the integration of Graph Neural Networks (GNNs) and Large Language Models (LLMs) into a unified Model architecture (e.g., LLM as enhancer, LLM as collaborators, LLM as predictor) has emerged as a promising technological paradigm. The core of this new graph learning paradigm lies in the synergistic combination of GNNs' ability to capture complex structural relationships and LLMs' proficiency in understanding informative contexts from the rich textual descriptions of graphs. Therefore, we can leverage graph description texts with rich semantic context to fundamentally enhance Data quality, thereby improving the representational capacity of model-centric approaches in line with data-centric machine learning principles. By leveraging the strengths of these distinct neural network architectures, this integrated approach addresses a wide range of TAG-based Task (e.g., graph learning, graph reasoning, and graph question answering), particularly in complex industrial scenarios (e.g., supervised, few-shot, and zero-shot settings). In other words, we can treat text as a medium to enable cross-domain generalization of graph learning Model, allowing a single graph model to effectively handle the diversity of downstream graph-based Task across different data domains. This work serves as a foundational reference for researchers and practitioners looking to advance graph learning methodologies in the rapidly evolving landscape of LLM. We consistently maintain the related open-source materials at \url{https://github.com/xkLi-Allen/Awesome-GNN-in-LLMs-Papers}.