Goto

Collaborating Authors

 Guo, Lei


Learning a Class of Mixed Linear Regressions: Global Convergence under General Data Conditions

arXiv.org Machine Learning

Mixed linear regression (MLR) has attracted increasing attention because of its great theoretical and practical importance in capturing nonlinear relationships by utilizing a mixture of linear regression sub-models. Although considerable efforts have been devoted to the learning problem of such systems, i.e., estimating data labels and identifying model parameters, most existing investigations employ the offline algorithm, impose the strict independent and identically distributed (i.i.d.) or persistent excitation (PE) conditions on the regressor data, and provide local convergence results only. In this paper, we investigate the recursive estimation and data clustering problems for a class of stochastic MLRs with two components. To address this inherently nonconvex optimization problem, we propose a novel two-step recursive identification algorithm to estimate the true parameters, where the direction vector and the scaling coefficient of the unknown parameters are estimated by the least squares and the expectation-maximization (EM) principles, respectively. Under a general data condition, which is much weaker than the traditional i.i.d. and PE conditions, we establish the global convergence and the convergence rate of the proposed identification algorithm for the first time. Furthermore, we prove that, without any excitation condition on the regressor data, the data clustering performance including the cumulative mis-classification error and the within-cluster error can be optimal asymptotically. Finally, we provide a numerical example to illustrate the performance of the proposed learning algorithm.


Feedback Favors the Generalization of Neural ODEs

arXiv.org Artificial Intelligence

The well-known generalization problem hinders the application of artificial neural networks in continuous-time prediction tasks with varying latent dynamics. In sharp contrast, biological systems can neatly adapt to evolving environments benefiting from real-time feedback mechanisms. Inspired by the feedback philosophy, we present feedback neural networks, showing that a feedback loop can flexibly correct the learned latent dynamics of neural ordinary differential equations (neural ODEs), leading to a prominent generalization improvement. The feedback neural network is a novel two-DOF neural network, which possesses robust performance in unseen scenarios with no loss of accuracy performance on previous tasks. A linear feedback form is presented to correct the learned latent dynamics firstly, with a convergence guarantee. Then, domain randomization is utilized to learn a nonlinear neural feedback form. Stemming from residual neural networks (He et al., 2016), neural ordinary differential equation (neural ODE) (Chen et al., 2018) emerges as a novel learning strategy aiming at learning the latent dynamic model of an unknown system. Recently, neural ODEs have been successfully applied to various scenarios, especially continuous-time missions (Liu & Stacey, 2024; Verma et al., 2024; Greydanus et al., 2019; Cranmer et al., 2020). However, like traditional neural networks, the generalization problem limits the application of neural ODEs in real-world applications. Traditional strategies like model simplification, fit coarsening, data augmentation, and transfer learning have considerably improved the generalization performance of neural networks on unseen tasks (Rohlfs, 2022). However, these strategies usually reduce the Figure 1: Neural network architectures.


3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models

arXiv.org Artificial Intelligence

Medical image analysis is crucial in modern radiological diagnostics, especially given the exponential growth in medical imaging data. The demand for automated report generation systems has become increasingly urgent. While prior research has mainly focused on using machine learning and multimodal language models for 2D medical images, the generation of reports for 3D medical images has been less explored due to data scarcity and computational complexities. This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model specifically designed for generating radiology reports from 3D CT scans, particularly chest CTs. Extensive experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality. Although current methods are few, including the partially open-source CT2Rep and the open-source M3D, we ensured fair comparison through appropriate data conversion and evaluation methodologies. Experimental results indicate that 3D-CT-GPT enhances diagnostic accuracy and report coherence, establishing itself as a robust solution for clinical radiology report generation. Future work will focus on expanding the dataset and further optimizing the model to enhance its performance and applicability.


Disturbance Observer for Estimating Coupled Disturbances

arXiv.org Artificial Intelligence

High-precision control for nonlinear systems is impeded by the low-fidelity dynamical model and external disturbance. Especially, the intricate coupling between internal uncertainty and external disturbance is usually difficult to be modeled explicitly. Here we show an effective and convergent algorithm enabling accurate estimation of the coupled disturbance via combining control and learning philosophies. Specifically, by resorting to Chebyshev series expansion, the coupled disturbance is firstly decomposed into an unknown parameter matrix and two known structures depending on system state and external disturbance respectively. A Regularized Least Squares (RLS) algorithm is subsequently formalized to learn the parameter matrix by using historical time-series data. Finally, a higher-order disturbance observer (HODO) is developed to achieve a high-precision estimation of the coupled disturbance by utilizing the learned portion. The efficiency of the proposed algorithm is evaluated through extensive simulations. We believe this work can offer a new option to merge learning schemes into the control framework for addressing existing intractable control problems.


Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning

arXiv.org Artificial Intelligence

In the medical multi-modal frameworks, the alignment of cross-modality features presents a significant challenge. However, existing works have learned features that are implicitly aligned from the data, without considering the explicit relationships in the medical context. This data-reliance may lead to low generalization of the learned alignment relationships. In this work, we propose the Eye-gaze Guided Multi-modal Alignment (EGMA) framework to harness eye-gaze data for better alignment of medical visual and textual features. We explore the natural auxiliary role of radiologists' eye-gaze data in aligning medical images and text, and introduce a novel approach by using eye-gaze data, collected synchronously by radiologists during diagnostic evaluations. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets, where EGMA achieved state-of-the-art performance and stronger generalization across different datasets. Additionally, we explore the impact of varying amounts of eye-gaze data on model performance, highlighting the feasibility and utility of integrating this auxiliary data into multi-modal alignment framework.


C2FAR: Coarse-to-Fine Autoregressive Networks for Precise Probabilistic Forecasting

arXiv.org Artificial Intelligence

C2FAR generates a hierarchical, coarse-to-fine discretization of a variable autoregressively; progressively finer intervals of support are generated from a sequence of binned distributions, where each distribution is conditioned on previously-generated coarser intervals. Unlike prior (flat) binned distributions, C2FAR can represent values with exponentially higher precision, for only a linear increase in complexity. We use C2FAR for probabilistic forecasting via a recurrent neural network, thus modeling time series autoregressively in both space and time. C2FAR is the first method to simultaneously handle discrete and continuous series of arbitrary scale and distribution shape. This flexibility enables a variety of time series use cases, including anomaly detection, interpolation, and compression. C2FAR achieves improvements over the state-of-the-art on several benchmark forecasting datasets.


SutraNets: Sub-series Autoregressive Networks for Long-Sequence, Probabilistic Forecasting

arXiv.org Artificial Intelligence

We propose SutraNets, a novel method for neural probabilistic forecasting of long-sequence time series. SutraNets use an autoregressive generative model to factorize the likelihood of long sequences into products of conditional probabilities. When generating long sequences, most autoregressive approaches suffer from harmful error accumulation, as well as challenges in modeling long-distance dependencies. SutraNets treat long, univariate prediction as multivariate prediction over lower-frequency sub-series. Autoregression proceeds across time and across sub-series in order to ensure coherent multivariate (and, hence, high-frequency univariate) outputs. Since sub-series can be generated using fewer steps, SutraNets effectively reduce error accumulation and signal path distances. We find SutraNets to significantly improve forecasting accuracy over competitive alternatives on six real-world datasets, including when we vary the number of sub-series and scale up the depth and width of the underlying sequence models.


Global Convergence of Online Identification for Mixed Linear Regression

arXiv.org Machine Learning

Mixed linear regression (MLR) is a powerful model for characterizing nonlinear relationships by utilizing a mixture of linear regression sub-models. The identification of MLR is a fundamental problem, where most of the existing results focus on offline algorithms, rely on independent and identically distributed (i.i.d) data assumptions, and provide local convergence results only. This paper investigates the online identification and data clustering problems for two basic classes of MLRs, by introducing two corresponding new online identification algorithms based on the expectation-maximization (EM) principle. It is shown that both algorithms will converge globally without resorting to the traditional i.i.d data assumptions. The main challenge in our investigation lies in the fact that the gradient of the maximum likelihood function does not have a unique zero, and a key step in our analysis is to establish the stability of the corresponding differential equation in order to apply the celebrated Ljung's ODE method. It is also shown that the within-cluster error and the probability that the new data is categorized into the correct cluster are asymptotically the same as those in the case of known parameters. Finally, numerical simulations are provided to verify the effectiveness of our online algorithms.


ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data

arXiv.org Artificial Intelligence

Radiology report generation, as a key step in medical image analysis, is critical to the quantitative analysis of clinically informed decision-making levels. However, complex and diverse radiology reports with cross-source heterogeneity pose a huge generalizability challenge to the current methods under massive data volume, mainly because the style and normativity of radiology reports are obviously distinctive among institutions, body regions inspected and radiologists. Recently, the advent of large language models (LLM) offers great potential for recognizing signs of health conditions. To resolve the above problem, we collaborate with the Second Xiangya Hospital in China and propose ChatRadio-Valuer based on the LLM, a tailored model for automatic radiology report generation that learns generalizable representations and provides a basis pattern for model adaptation in sophisticated analysts' cases. Specifically, ChatRadio-Valuer is trained based on the radiology reports from a single institution by means of supervised fine-tuning, and then adapted to disease diagnosis tasks for human multi-system evaluation (i.e., chest, abdomen, muscle-skeleton, head, and maxillofacial $\&$ neck) from six different institutions in clinical-level events. The clinical dataset utilized in this study encompasses a remarkable total of \textbf{332,673} observations. From the comprehensive results on engineering indicators, clinical efficacy and deployment cost metrics, it can be shown that ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al., in terms of the diseases diagnosis from radiology reports. ChatRadio-Valuer provides an effective avenue to boost model generalization performance and alleviate the annotation workload of experts to enable the promotion of clinical AI applications in radiology reports.


Asymptotically Efficient Online Learning for Censored Regression Models Under Non-I.I.D Data

arXiv.org Artificial Intelligence

The asymptotically efficient online learning problem is investigated for stochastic censored regression models, which arise from various fields of learning and statistics but up to now still lacks comprehensive theoretical studies on the efficiency of the learning algorithms. For this, we propose a two-step online algorithm, where the first step focuses on achieving algorithm convergence, and the second step is dedicated to improving the estimation performance. Under a general excitation condition on the data, we show that our algorithm is strongly consistent and asymptotically normal by employing the stochastic Lyapunov function method and limit theories for martingales. Moreover, we show that the covariances of the estimates can achieve the Cramér-Rao(C-R) bound asymptotically, indicating that the performance of the proposed algorithm is the best possible that one can expect in general. Unlike most of the existing works, our results are obtained without resorting to the traditionally used but stringent conditions such as independent and identically distributed (i.i.d) assumption on the data, and thus our results do not exclude applications to stochastic dynamical systems with feedback. A numerical example is also provided to illustrate the superiority of the proposed online algorithm over the existing related ones in the literature. Keywords: stochastic dynamical systems, censored regression models, online learning, non-i.i.d data, cramér-Rao bound.