Goto

Collaborating Authors

 Zhang, Cheng


Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations

arXiv.org Artificial Intelligence

LLMs often adopt an assertive language style also when making false claims. Such ``overconfident hallucinations'' mislead users and erode trust. Achieving the ability to express in language the actual degree of uncertainty around a claim is therefore of great importance. We find that ``verbal uncertainty'' is governed by a single linear feature in the representation space of LLMs, and show that this has only moderate correlation with the actual ``semantic uncertainty'' of the model. We apply this insight and show that (1) the mismatch between semantic and verbal uncertainty is a better predictor of hallucinations than semantic uncertainty alone and (2) we can intervene on verbal uncertainty at inference time and reduce hallucinations on short-form answers, achieving an average relative reduction of 32%.


GarmentCrafter: Progressive Novel View Synthesis for Single-View 3D Garment Reconstruction and Editing

arXiv.org Artificial Intelligence

We introduce GarmentCrafter, a new approach that enables non-professional users to create and modify 3D garments from a single-view image. While recent advances in image generation have facilitated 2D garment design, creating and editing 3D garments remains challenging for non-professional users. Existing methods for single-view 3D reconstruction often rely on pre-trained generative models to synthesize novel views conditioning on the reference image and camera pose, yet they lack cross-view consistency, failing to capture the internal relationships across different views. In this paper, we tackle this challenge through progressive depth prediction and image warping to approximate novel views. Subsequently, we train a multi-view diffusion model to complete occluded and unknown clothing regions, informed by the evolving camera pose. By jointly inferring RGB and depth, GarmentCrafter enforces inter-view coherence and reconstructs precise geometries and fine details. Extensive experiments demonstrate that our method achieves superior visual fidelity and inter-view coherence compared to state-of-the-art single-view 3D garment reconstruction methods.


HVI: A New Color Space for Low-light Image Enhancement

arXiv.org Artificial Intelligence

Low-Light Image Enhancement (LLIE) is a crucial computer vision task that aims to restore detailed visual information from corrupted low-light images. Many existing LLIE methods are based on standard RGB (sRGB) space, which often produce color bias and brightness artifacts due to inherent high color sensitivity in sRGB. While converting the images using Hue, Saturation and Value (HSV) color space helps resolve the brightness issue, it introduces significant red and black noise artifacts. To address this issue, we propose a new color space for LLIE, namely Horizontal/Vertical-Intensity (HVI), defined by polarized HS maps and learnable intensity. The former enforces small distances for red coordinates to remove the red artifacts, while the latter compresses the low-light regions to remove the black artifacts. To fully leverage the chromatic and intensity information, a novel Color and Intensity Decoupling Network (CIDNet) is further introduced to learn accurate photometric mapping function under different lighting conditions in the HVI space. Comprehensive results from benchmark and ablation experiments show that the proposed HVI color space with CIDNet outperforms the state-of-the-art methods on 10 datasets. The code is available at https://github.com/Fediory/HVI-CIDNet.


Adversarial Transform Particle Filters

arXiv.org Machine Learning

The particle filter (PF) and the ensemble Kalman filter (EnKF) are widely used for approximate inference in state-space models. From a Bayesian perspective, these algorithms represent the prior by an ensemble of particles and update it to the posterior with new observations over time. However, the PF often suffers from weight degeneracy in high-dimensional settings, whereas the EnKF relies on linear Gaussian assumptions that can introduce significant approximation errors. In this paper, we propose the Adversarial Transform Particle Filter (ATPF), a novel filtering framework that combines the strengths of the PF and the EnKF through adversarial learning. Specifically, importance sampling is used to ensure statistical consistency as in the PF, while adversarially learned transformations, such as neural networks, allow accurate posterior matching for nonlinear and non-Gaussian systems. In addition, we incorporate kernel methods to ease optimization and leverage regularization techniques based on optimal transport for better statistical properties and numerical stability. We provide theoretical guarantees, including generalization bounds for both the analysis and forecast steps of ATPF. Extensive experiments across various nonlinear and non-Gaussian scenarios demonstrate the effectiveness and practical advantages of our method.


PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders

arXiv.org Machine Learning

Learning informative representations of phylogenetic tree structures is essential for analyzing evolutionary relationships. Classical distance-based methods have been widely used to project phylogenetic trees into Euclidean space, but they are often sensitive to the choice of distance metric and may lack sufficient resolution. In this paper, we introduce phylogenetic variational autoencoders (PhyloVAEs), an unsupervised learning framework designed for representation learning and generative modeling of tree topologies. Leveraging an efficient encoding mechanism inspired by autoregressive tree topology generation, we develop a deep latent-variable generative model that facilitates fast, parallelized topology generation. Phylo-VAE combines this generative model with a collaborative inference model based on learnable topological features, allowing for high-resolution representations of phylogenetic tree samples. Extensive experiments demonstrate PhyloVAE's robust representation learning capabilities and fast generation of phylogenetic tree topologies. Phylogenetic trees are the foundational structure for describing the evolutionary processes among individuals or groups of biological entities. Reconstructing these trees based on collected biological sequences (e.g., DNA, RNA, protein) from observed species, also known as phylogenetic inference (Felsenstein, 2004), is an essential discipline of computational biology (Fitch, 1971; Felsenstein, 1981; Yang & Rannala, 1997; Ronquist et al., 2012). Large collections of trees obtained from these approaches (e.g., posterior samples from MCMC runs (Ronquist et al., 2012)), however, are often difficult to summarize or visualize due to the discrete and non-Euclidean nature of the tree topology space The classical approach to visualize and analyze distributions of phylogenetic trees is to calculate pairwise distances between the trees and project them into a plane using multidimensional scaling (MDS) (Amenta & Klingner, 2002; Hillis et al., 2005; Jombart et al., 2017). However, these approaches have the shortcoming that one can not map an arbitrary point in the visualization to a tree, and therefore do not form an actual visualization of the relevant tree space.


Provable Sample-Efficient Transfer Learning Conditional Diffusion Models via Representation Learning

arXiv.org Machine Learning

While conditional diffusion models have achieved remarkable success in various applications, they require abundant data to train from scratch, which is often infeasible in practice. To address this issue, transfer learning has emerged as an essential paradigm in small data regimes. Despite its empirical success, the theoretical underpinnings of transfer learning conditional diffusion models remain unexplored. In this paper, we take the first step towards understanding the sample efficiency of transfer learning conditional diffusion models through the lens of representation learning. Inspired by practical training procedures, we assume that there exists a low-dimensional representation of conditions shared across all tasks. Our analysis shows that with a well-learned representation from source tasks, the samplecomplexity of target tasks can be reduced substantially. In addition, we investigate the practical implications of our theoretical results in several real-world applications of conditional diffusion models. Numerical experiments are also conducted to verify our results.


RMTransformer: Accurate Radio Map Construction and Coverage Prediction

arXiv.org Artificial Intelligence

Radio map, or pathloss map prediction, is a crucial method for wireless network modeling and management. By leveraging deep learning to construct pathloss patterns from geographical maps, an accurate digital replica of the transmission environment could be established with less computational overhead and lower prediction error compared to traditional model-driven techniques. While existing state-of-the-art (SOTA) methods predominantly rely on convolutional architectures, this paper introduces a hybrid transformer-convolution model, termed RMTransformer, to enhance the accuracy of radio map prediction. The proposed model features a multi-scale transformer-based encoder for efficient feature extraction and a convolution-based decoder for precise pixel-level image reconstruction. Simulation results demonstrate that the proposed scheme significantly improves prediction accuracy, and over a 30% reduction in root mean square error (RMSE) is achieved compared to typical SOTA approaches.


CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language Models

arXiv.org Artificial Intelligence

Causal reasoning capabilities are essential for large language models (LLMs) in a wide range of applications, such as education and healthcare. But there is still a lack of benchmarks for a better understanding of such capabilities. Current LLM benchmarks are mainly based on conversational tasks, academic math tests, and coding tests. Such benchmarks evaluate LLMs in well-regularized settings, but they are limited in assessing the skills and abilities to solve real-world problems. In this work, we provide a benchmark, named by CARL-GT, which evaluates CAusal Reasoning capabilities of large Language models using Graphs and Tabular data. The benchmark has a diverse range of tasks for evaluating LLMs from causal graph reasoning, knowledge discovery, and decision-making aspects. In addition, effective zero-shot learning prompts are developed for the tasks. In our experiments, we leverage the benchmark for evaluating open-source LLMs and provide a detailed comparison of LLMs for causal reasoning abilities. We found that LLMs are still weak in casual reasoning, especially with tabular data to discover new insights. Furthermore, we investigate and discuss the relationships of different benchmark tasks by analyzing the performance of LLMs. The experimental results show that LLMs have different strength over different tasks and that their performance on tasks in different categories, i.e., causal graph reasoning, knowledge discovery, and decision-making, shows stronger correlation than tasks in the same category.


Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models

arXiv.org Artificial Intelligence

Parameter-Efficient Fine-Tuning (PEFT) has gained prominence through low-rank adaptation methods like LoRA. In this paper, we focus on sparsity-based PEFT (SPEFT), which introduces trainable sparse adaptations to the weight matrices in the model, offering greater flexibility in selecting fine-tuned parameters compared to low-rank methods. We conduct the first systematic evaluation of salience metrics for SPEFT, inspired by zero-cost NAS proxies, and identify simple gradient-based metrics is reliable, and results are on par with the best alternatives, offering both computational efficiency and robust performance. Additionally, we compare static and dynamic masking strategies, finding that static masking, which predetermines non-zero entries before training, delivers efficiency without sacrificing performance, while dynamic masking offers no substantial benefits. Across NLP tasks, a simple gradient-based, static SPEFT consistently outperforms other fine-tuning methods for LLMs, providing a simple yet effective baseline for SPEFT. Our work challenges the notion that complexity is necessary for effective PEFT. Our work is open source and available to the community at [https://github.com/0-ml/speft].


Resonant Inductive Coupling Power Transfer for Mid-Sized Inspection Robot

arXiv.org Artificial Intelligence

This paper presents a wireless power transfer (WPT) for a mid-sized inspection mobile robot. The objective is to transmit 100 W of power over 1 meter of distance, achieved through lightweight Litz wire coils weighing 320 g held together with a coil structure of 3.54 kg. The Wireless Power Transfer System (WPTS) is mounted onto an unmanned ground vehicle (UGV). The study addresses an investigation of coil design, accounting for misalignment and tolerance issues in resonance-coupled coils. In experimental validation, the system effectively transmits 109.7 W of power over a 1-meter distance, with obstacles present. This achievement yields a system efficiency of 47.14%, a value that is remarkably close to the maximum power transfer point (50%) when the WPTS utilises the full voltage allowance of the capacitor. The paper shows the WPTS charging speed of 5 minutes for 12 V, 0.8 Ah lead acid batteries.