Goto

Collaborating Authors

 Overview


Convolutional neural networks for mineral prospecting through alteration mapping with remote sensing data

arXiv.org Artificial Intelligence

Traditional geological mapping methods, which rely on field observations and rock sample analysis, are ine fficient for continuous spatial mapping of geological features such as alteration zones. Deep learning models such as convolutional neural networks (CNNs) have ushered in a transformative era in remote sensing data analysis. CNNs excel in automatically extracting features from image data for classification and regression problems. CNNs have the ability to pinpoint specific mineralogical changes attributed to mineralisation processes by discerning subtle features within remote sensing data. Our methodology involves model training using two distinct sets of training samples generated through ground truth data and a fully automated approach through selective principal component analysis (PCA). We also compare CNNs with conventional machine learning models, including k-nearest neighbours, support vector machines, and multilayer perceptron. Our findings indicate that training with a ground truth-based dataset produces more reliable alteration maps. Additionally, we find that CNNs perform slightly better when compared to conventional machine learning models, which further demonstrates the ability of CNNs to capture spatial patterns in remote sensing data e ffectively. We find that Landsat 9 surpasses Landsat 8 in mapping iron oxide areas when employing the CNNs model trained with ground truth data obtained by field surveys. We also observe that using ASTER data with the CNNs model trained on the ground truth-based dataset produces the most accurate maps for two other important types of alteration zones, argillic and propylitic. This underscores the utility of CNNs in enhancing the e fficiency and precision of geological mapping, particularly in discerning subtle alterations indicative of mineralisation processes, especially those associated with critical metal resources. Introduction Geological maps are traditionally crafted through ground surveys and founded on field observations. They frequently incur inevitable errors due to the lack of spatial continuity of the field observations, thus yielding inaccurate representations (Campbell et al., 2005). Recognising these limitations, geologists have been prompted to seek innovative approaches and e fficient methodologies to accurately map geological features, particularly alteration zones (Kesler, 2007; McCuaig et al., 2010). The utilisation of remote sensing data for alteration mapping emerges as a pivotal technique in regional mineral exploration, enabling the precise spatial identification of alteration zones associated with mineralisation processes (Mohamed et al., 2021).


Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions

arXiv.org Artificial Intelligence

In recent years, generative models have achieved remarkable performance across diverse applications, including image generation, text synthesis, audio creation, video generation, and data augmentation. Diffusion models have emerged as superior alternatives to Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) by addressing their limitations, such as training instability, mode collapse, and poor representation of multimodal distributions. This success has spurred widespread research interest. In the domain of tabular data, diffusion models have begun to showcase similar advantages over GANs and VAEs, achieving significant performance breakthroughs and demonstrating their potential for addressing unique challenges in tabular data modeling. However, while domains like images and time series have numerous surveys summarizing advancements in diffusion models, there remains a notable gap in the literature for tabular data. Despite the increasing interest in diffusion models for tabular data, there has been little effort to systematically review and summarize these developments. This lack of a dedicated survey limits a clear understanding of the challenges, progress, and future directions in this critical area. This survey addresses this gap by providing a comprehensive review of diffusion models for tabular data. Covering works from June 2015, when diffusion models emerged, to December 2024, we analyze nearly all relevant studies, with updates maintained in a \href{https://github.com/Diffusion-Model-Leiden/awesome-diffusion-models-for-tabular-data}{GitHub repository}. Assuming readers possess foundational knowledge of statistics and diffusion models, we employ mathematical formulations to deliver a rigorous and detailed review, aiming to promote developments in this emerging and exciting area.


Vision Language Models in Medicine

arXiv.org Artificial Intelligence

With the advent of Vision-Language Models (VLMs), medical artificial intelligence (AI) has experienced significant technological progress and paradigm shifts. This survey provides an extensive review of recent advancements in Medical Vision-Language Models (Med-VLMs), which integrate visual and textual data to enhance healthcare outcomes. We discuss the foundational technology behind Med-VLMs, illustrating how general models are adapted for complex medical tasks, and examine their applications in healthcare. The transformative impact of Med-VLMs on clinical practice, education, and patient care is highlighted, alongside challenges such as data scarcity, narrow task generalization, interpretability issues, and ethical concerns like fairness, accountability, and privacy. These limitations are exacerbated by uneven dataset distribution, computational demands, and regulatory hurdles. Rigorous evaluation methods and robust regulatory frameworks are essential for safe integration into healthcare workflows. Future directions include leveraging large-scale, diverse datasets, improving cross-modal generalization, and enhancing interpretability. Innovations like federated learning, lightweight architectures, and Electronic Health Record (EHR) integration are explored as pathways to democratize access and improve clinical relevance. This review aims to provide a comprehensive understanding of Med-VLMs' strengths and limitations, fostering their ethical and balanced adoption in healthcare.


Sample Selection via Contrastive Fragmentation for Noisy Label Regression

arXiv.org Artificial Intelligence

As with many other problems, real-world regression is plagued by the presence of noisy labels, an inevitable issue that demands our attention. Fortunately, much real-world data often exhibits an intrinsic property of continuously ordered correlations between labels and features, where data points with similar labels are also represented with closely related features. In response, we propose a novel approach named ConFrag, where we collectively model the regression data by transforming them into disjoint yet contrasting fragmentation pairs. This enables the training of more distinctive representations, enhancing the ability to select clean samples. Our ConFrag framework leverages a mixture of neighboring fragments to discern noisy labels through neighborhood agreement among expert feature extractors. We extensively perform experiments on six newly curated benchmark datasets of diverse domains, including age prediction, price prediction, and music production year estimation. We also introduce a metric called Error Residual Ratio (ERR) to better account for varying degrees of label noise. Our approach consistently outperforms fourteen state-of-the-art baselines, being robust against symmetric and random Gaussian label noise.


On the usability of generative AI: Human generative AI

arXiv.org Artificial Intelligence

Generative AI systems are transforming content creation, but their usability remains a key challenge. This paper examines usability factors such as user experience, transparency, control, and cognitive load. Common challenges include unpredictability and difficulties in fine-tuning outputs. We review evaluation metrics like efficiency, learnability, and satisfaction, highlighting best practices from various domains. Improving interpretability, intuitive interfaces, and user feedback can enhance usability, making generative AI more accessible and effective.


From Perceptions to Decisions: Wildfire Evacuation Decision Prediction with Behavioral Theory-informed LLMs

arXiv.org Artificial Intelligence

Evacuation decision prediction is critical for efficient and effective wildfire response by helping emergency management anticipate traffic congestion and bottlenecks, allocate resources, and minimize negative impacts. Traditional statistical methods for evacuation decision prediction fail to capture the complex and diverse behavioral logic of different individuals. In this work, for the first time, we introduce FLARE, short for facilitating LLM for advanced reasoning on wildfire evacuation decision prediction, a Large Language Model (LLM)-based framework that integrates behavioral theories and models to streamline the Chain-of-Thought (CoT) reasoning and subsequently integrate with memory-based Reinforcement Learning (RL) module to provide accurate evacuation decision prediction and understanding. Our proposed method addresses the limitations of using existing LLMs for evacuation behavioral predictions, such as limited survey data, mismatching with behavioral theory, conflicting individual preferences, implicit and complex mental states, and intractable mental state-behavior mapping. Experiments on three post-wildfire survey datasets show an average of 20.47% performance improvement over traditional theory-informed behavioral models, with strong cross-event generalizability. Our complete code is publicly available at https://github.com/SusuXu-s-Lab/FLARE


Socratic: Enhancing Human Teamwork via AI-enabled Coaching

arXiv.org Artificial Intelligence

Coaches are vital for effective collaboration, but cost and resource constraints often limit their availability during real-world tasks. This limitation poses serious challenges in life-critical domains that rely on effective teamwork, such as healthcare and disaster response. To address this gap, we propose and realize an innovative application of AI: task-time team coaching. Specifically, we introduce Socratic, a novel AI system that complements human coaches by providing real-time guidance during task execution. Socratic monitors team behavior, detects misalignments in team members' shared understanding, and delivers automated interventions to improve team performance. We validated Socratic through two human subject experiments involving dyadic collaboration. The results demonstrate that the system significantly enhances team performance with minimal interventions. Participants also perceived Socratic as helpful and trustworthy, supporting its potential for adoption. Our findings also suggest promising directions both for AI research and its practical applications to enhance human teamwork.


Representation Engineering for Large-Language Models: Survey and Research Challenges

arXiv.org Artificial Intelligence

Large-language models are capable of completing a variety of tasks, but remain unpredictable and intractable. Representation engineering seeks to resolve this problem through a new approach utilizing samples of contrasting inputs to detect and edit high-level representations of concepts such as honesty, harmfulness or power-seeking. We formalize the goals and methods of representation engineering to present a cohesive picture of work in this emerging discipline. We compare it with alternative approaches, such as mechanistic interpretability, prompt-engineering and fine-tuning. We outline risks such as performance decrease, compute time increases and steerability issues. We present a clear agenda for future research to build predictable, dynamic, safe and personalizable LLMs.


The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

arXiv.org Artificial Intelligence

Motivated by reducing the computational and storage costs of LLMs, model compression and KV cache compression have attracted much attention from researchers. However, current methods predominantly emphasize maintaining the performance of compressed LLMs, as measured by perplexity or simple accuracy on tasks of common sense knowledge QA and basic arithmetic reasoning. In this blog, we present a brief review of recent advancements in LLMs related to retrieval-augmented generation, multi-step reasoning, external tools, and computational expressivity, all of which substantially enhance LLM performance. Then, we propose a lottery LLM hypothesis suggesting that for a given LLM and task, there exists a smaller lottery LLM capable of producing the same performance as the original LLM with the assistance of multi-step reasoning and external tools. Based on the review of current progress in LLMs, we discuss and summarize the essential capabilities that the lottery LLM and KV cache compression must possess, which are currently overlooked in existing methods.


Enriching Physical-Virtual Interaction in AR Gaming by Tracking Identical Real Objects

arXiv.org Artificial Intelligence

Augmented reality (AR) games, particularly those designed for headsets, have become increasingly prevalent with advancements in both hardware and software. However, the majority of AR games still rely on pre-scanned or static scenes, and interaction mechanisms are often limited to controllers or hand-tracking. Additionally, the presence of identical objects in AR games poses challenges for conventional object tracking techniques, which often struggle to differentiate between identical objects or necessitate the installation of fixed cameras for global object movement tracking. In response to these limitations, we present a novel approach to address the tracking of identical objects in an AR scene to enrich physical-virtual interaction. Our method leverages partial scene observations captured by an AR headset, utilizing the perspective and spatial data provided by this technology. Object identities within the scene are determined through the solution of a label assignment problem using integer programming. To enhance computational efficiency, we incorporate a Voronoi diagram-based pruning method into our approach. Our implementation of this approach in a farm-to-table AR game demonstrates its satisfactory performance and robustness. Furthermore, we showcase the versatility and practicality of our method through applications in AR storytelling and a simulated gaming robot. Our video demo is available at: https://youtu.be/rPGkLYuKvCQ.