Goto

Collaborating Authors

 Liu, Zhiyu


Efficient Over-parameterized Matrix Sensing from Noisy Measurements via Alternating Preconditioned Gradient Descent

arXiv.org Machine Learning

We consider the noisy matrix sensing problem in the over-parameterization setting, where the estimated rank $r$ is larger than the true rank $r_\star$. Specifically, our main objective is to recover a matrix $ X_\star \in \mathbb{R}^{n_1 \times n_2} $ with rank $ r_\star $ from noisy measurements using an over-parameterized factorized form $ LR^\top $, where $ L \in \mathbb{R}^{n_1 \times r}, \, R \in \mathbb{R}^{n_2 \times r} $ and $ \min\{n_1, n_2\} \ge r > r_\star $, with the true rank $ r_\star $ being unknown. Recently, preconditioning methods have been proposed to accelerate the convergence of matrix sensing problem compared to vanilla gradient descent, incorporating preconditioning terms $ (L^\top L + \lambda I)^{-1} $ and $ (R^\top R + \lambda I)^{-1} $ into the original gradient. However, these methods require careful tuning of the damping parameter $\lambda$ and are sensitive to initial points and step size. To address these limitations, we propose the alternating preconditioned gradient descent (APGD) algorithm, which alternately updates the two factor matrices, eliminating the need for the damping parameter and enabling faster convergence with larger step sizes. We theoretically prove that APGD achieves near-optimal error convergence at a linear rate, starting from arbitrary random initializations. Through extensive experiments, we validate our theoretical results and demonstrate that APGD outperforms other methods, achieving the fastest convergence rate. Notably, both our theoretical analysis and experimental results illustrate that APGD does not rely on the initialization procedure, making it more practical and versatile.


Low-Tubal-Rank Tensor Recovery via Factorized Gradient Descent

arXiv.org Artificial Intelligence

This paper considers the problem of recovering a tensor with an underlying low-tubal-rank structure from a small number of corrupted linear measurements. Traditional approaches tackling such a problem require the computation of tensor Singular Value Decomposition (t-SVD), that is a computationally intensive process, rendering them impractical for dealing with large-scale tensors. Aim to address this challenge, we propose an efficient and effective low-tubal-rank tensor recovery method based on a factorization procedure akin to the Burer-Monteiro (BM) method. Precisely, our fundamental approach involves decomposing a large tensor into two smaller factor tensors, followed by solving the problem through factorized gradient descent (FGD). This strategy eliminates the need for t-SVD computation, thereby reducing computational costs and storage requirements. We provide rigorous theoretical analysis to ensure the convergence of FGD under both noise-free and noisy situations. Additionally, it is worth noting that our method does not require the precise estimation of the tensor tubal-rank. Even in cases where the tubal-rank is slightly overestimated, our approach continues to demonstrate robust performance. A series of experiments have been carried out to demonstrate that, as compared to other popular ones, our approach exhibits superior performance in multiple scenarios, in terms of the faster computational speed and the smaller convergence error.


Gemini: A Family of Highly Capable Multimodal Models

arXiv.org Artificial Intelligence

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of Gemini models in cross-modal reasoning and language understanding will enable a wide variety of use cases and we discuss our approach toward deploying them responsibly to users.


Specification mining and automated task planning for autonomous robots based on a graph-based spatial temporal logic

arXiv.org Artificial Intelligence

We aim to enable an autonomous robot to learn new skills from demo videos and use these newly learned skills to accomplish non-trivial high-level tasks. The goal of developing such autonomous robot involves knowledge representation, specification mining, and automated task planning. For knowledge representation, we use a graph-based spatial temporal logic (GSTL) to capture spatial and temporal information of related skills demonstrated by demo videos. We design a specification mining algorithm to generate a set of parametric GSTL formulas from demo videos by inductively constructing spatial terms and temporal formulas. The resulting parametric GSTL formulas from specification mining serve as a domain theory, which is used in automated task planning for autonomous robots. We propose an automatic task planning based on GSTL where a proposer is used to generate ordered actions, and a verifier is used to generate executable task plans. A table setting example is used throughout the paper to illustrate the main ideas.


Respiratory Motion Correction in Abdominal MRI using a Densely Connected U-Net with GAN-guided Training

arXiv.org Artificial Intelligence

Abdominal magnetic resonance imaging (MRI) provides a straightforward way of characterizing tissue and locating lesions of patients as in standard diagnosis. However, abdominal MRI often suffers from respiratory motion artifacts, which leads to blurring and ghosting that significantly deteriorates the imaging quality. Conventional methods to reduce or eliminate these motion artifacts include breath holding, patient sedation, respiratory gating, and image post-processing, but these strategies inevitably involve extra scanning time and patient discomfort. In this paper, we propose a novel deep-learning-based model to recover MR images from respiratory motion artifacts. The proposed model comprises a densely connected U-net with generative adversarial network (GAN)- guided training and a perceptual loss function. We validate the model using a diverse collection of MRI data that are adversely affected by both synthetic and authentic respiration artifacts. Effective outcomes of motion removal are demonstrated. Our experimental results show the great potential of utilizing deep-learning-based methods in respiratory motion correction for abdominal MRI.