Li, Zhipeng
Control System Design and Experiments for Autonomous Underwater Helicopter Docking Procedure Based on Acoustic-inertial-optical Guidance
Li, Haoda, An, Xinyu, Feng, Rendong, Rong, Zhenwei, Zhang, Zhuoyu, Li, Zhipeng, Zhao, Liming, Chen, Ying
A control system structure for the underwater docking procedure of an Autonomous Underwater Helicopter (AUH) is proposed in this paper, which utilizes acoustic-inertial-optical guidance. Unlike conventional Autonomous Underwater Vehicles (AUVs), the maneuverability requirements for AUHs are more stringent during the docking procedure, requiring it to remain stationary or have minimal horizontal movement while moving vertically. The docking procedure is divided into two stages: Homing and Landing, each stage utilizing different guidance methods. Additionally, a segmented aligning strategy operating at various altitudes and a linear velocity decision are both adopted in Landing stage. Due to the unique structure of the Subsea Docking System (SDS), the AUH is required to dock onto the SDS in a fixed orientation with specific attitude and altitude. Therefore, a particular criterion is proposed to determine whether the AUH has successfully docked onto the SDS. Furthermore, the effectiveness and robustness of the proposed control method in AUH's docking procedure are demonstrated through pool experiments and sea trials.
Automated architectural space layout planning using a physics-inspired generative design framework
Li, Zhipeng, Li, Sichao, Hinchcliffe, Geoff, Maitless, Noam, Birbilis, Nick
During this stage, the foundational spatial arrangement is conceptualised, setting the stage for subsequent spatial interactions and functional efficacy. Typically, architects initiate the space layout design by creating rough sketches or diagrams to delineate the positions and interrelationships of distinct functional areas, subsequently refining these into multiple design solutions. The meticulous planning of space layout, which outlines the internal spaces' form, size, and circulation patterns, directly influences the building's operational performance and economic outlay [1, 2]. Layout planning is recognised as a wicked problem due to its inherent complexity and variability [3]. This complexity tends to escalate, presenting a compounded challenge for human designers as the scale and intricacies of the project increase. Computational design and design automation techniques have been utilised extensively within the realm of architecture, offering significant time savings by streamlining repetitive tasks and thereby enhancing designer productivity [4-7]. This efficiency has paved the way for these technologies to be integrated more deeply into architectural practices. Consequently, it is a natural progression to employ these automated techniques to assist designers in the repetitive or complex task of space layout planning in architecture. In recent years, generative design and automated generation of floorplans and space layout has garnered considerable interest, indicating a potential paradigm shift in design methodologies.
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Xiao, Hanguang, Zhou, Feizhong, Liu, Xingyue, Liu, Tianqi, Li, Zhipeng, Liu, Xin, Huang, Xiaoxuan
Transformer's robust parallel computing capability and self-attention mechanism enable the integration of vast amounts of training data, laying the foundation for the development of LLMs and MLLMs [160]. To date, a series of Transformer-based LLMs and MLLMs have emerged (this survey primarily focuses on the vision-language modality), such as the PaLM series [6, 34], GPT series [16, 149], and LLaMA series [192, 193] belonging to LLMs, as well as Gemini [185], GPT-4 [1], and Claude 3 [7] belonging to MLLMs. Due to their powerful capabilities in understanding, reasoning, and generation, they have achieved state-of-the-art results in various downstream tasks, including text generation, machine translation and visual question answering (VQA). LLMs and MLLMs demonstrate increasingly powerful generalization abilities, with their impact extending to the medical domain, accelerating the integration of artificial intelligence and medicine [186, 188]. Particularly, Google's Med-PaLM 2 [171] achieved a score of 86.5 in the United States Medical Licensing Examination (USMLE) [83], reaching the level of medical experts [267], further showcasing the enormous potential of LLMs in the medical field. In addition, more medical LLMs and MLLMs, such as ChatDoctor [116], LLaVA-Med [107] and XrayGLM [211], represent new avenues provided by artificial intelligence for the medical field, offering potential solutions for subsequent medical report generation [201, 202, 217], clinical diagnosis [168, 195, 212], mental health services [30, 126], and a range of other clinical applications. Despite the academic breakthrough of LLMs and MLLMs in the medical field, there are still certain challenges for hospitals to train their own medical LLMs and MLLMs and deploy them into practical clinical applications. Firstly, training requires a substantial amount of medical data, which is often costly to acquire and necessitates annotation by medical experts, while also raising concerns regarding data privacy [257], all of which will pose particular challenges to model development. Secondly, the immense parameters and computation of LLMs and MLLMs demand substantial computational resources for their training and deployment [143, 157], significantly raising the threshold for hospitals to adopt LLMs and MLLMs.
Time2Stop: Adaptive and Explainable Human-AI Loop for Smartphone Overuse Intervention
Orzikulova, Adiba, Xiao, Han, Li, Zhipeng, Yan, Yukang, Wang, Yuntao, Shi, Yuanchun, Ghassemi, Marzyeh, Lee, Sung-Ju, Dey, Anind K, Xu, Xuhai "Orson"
Despite a rich history of investigating smartphone overuse intervention techniques, AI-based just-in-time adaptive intervention (JITAI) methods for overuse reduction are lacking. We develop Time2Stop, an intelligent, adaptive, and explainable JITAI system that leverages machine learning to identify optimal intervention timings, introduces interventions with transparent AI explanations, and collects user feedback to establish a human-AI loop and adapt the intervention model over time. We conducted an 8-week field experiment (N=71) to evaluate the effectiveness of both the adaptation and explanation aspects of Time2Stop. Our results indicate that our adaptive models significantly outperform the baseline methods on intervention accuracy (>32.8\% relatively) and receptivity (>8.0\%). In addition, incorporating explanations further enhances the effectiveness by 53.8\% and 11.4\% on accuracy and receptivity, respectively. Moreover, Time2Stop significantly reduces overuse, decreasing app visit frequency by 7.0$\sim$8.9\%. Our subjective data also echoed these quantitative measures. Participants preferred the adaptive interventions and rated the system highly on intervention time accuracy, effectiveness, and level of trust. We envision our work can inspire future research on JITAI systems with a human-AI loop to evolve with users.
Multi-perspective Feedback-attention Coupling Model for Continuous-time Dynamic Graphs
Zhu, Xiaobo, Wu, Yan, Li, Zhipeng, Su, Hailong, Che, Jin, Chen, Zhanheng, Wang, Liying
Recently, representation learning over graph networks has gained popularity, with various models showing promising results. Despite this, several challenges persist: 1) most methods are designed for static or discrete-time dynamic graphs; 2) existing continuous-time dynamic graph algorithms focus on a single evolving perspective; and 3) many continuous-time dynamic graph approaches necessitate numerous temporal neighbors to capture long-term dependencies. In response, this paper introduces the Multi-Perspective Feedback-Attention Coupling (MPFA) model. MPFA incorporates information from both evolving and raw perspectives, efficiently learning the interleaved dynamics of observed processes. The evolving perspective employs temporal self-attention to distinguish continuously evolving temporal neighbors for information aggregation. Through dynamic updates, this perspective can capture long-term dependencies using a small number of temporal neighbors. Meanwhile, the raw perspective utilizes a feedback attention module with growth characteristic coefficients to aggregate raw neighborhood information. Experimental results on a self-organizing dataset and seven public datasets validate the efficacy and competitiveness of our proposed model.
SUPER Learning: A Supervised-Unsupervised Framework for Low-Dose CT Image Reconstruction
Li, Zhipeng, Ye, Siqi, Long, Yong, Ravishankar, Saiprasad
Recent years have witnessed growing interest in machine learning-based models and techniques for low-dose X-ray CT (LDCT) imaging tasks. The methods can typically be categorized into supervised learning methods and unsupervised or model-based learning methods. Supervised learning methods have recently shown success in image restoration tasks. However, they often rely on large training sets. Model-based learning methods such as dictionary or transform learning do not require large or paired training sets and often have good generalization properties, since they learn general properties of CT image sets. Recent works have shown the promising reconstruction performance of methods such as PWLS-ULTRA that rely on clustering the underlying (reconstructed) image patches into a learned union of transforms. In this paper, we propose a new Supervised-UnsuPERvised (SUPER) reconstruction framework for LDCT image reconstruction that combines the benefits of supervised learning methods and (unsupervised) transform learning-based methods such as PWLS-ULTRA that involve highly image-adaptive clustering. The SUPER model consists of several layers, each of which includes a deep network learned in a supervised manner and an unsupervised iterative method that involves image-adaptive components. The SUPER reconstruction algorithms are learned in a greedy manner from training data. The proposed SUPER learning methods dramatically outperform both the constituent supervised learning-based networks and iterative algorithms for LDCT, and use much fewer iterations in the iterative reconstruction modules.
Combinatorial Keyword Recommendations for Sponsored Search with Deep Reinforcement Learning
Li, Zhipeng, Wu, Jianwei, Sun, Lin, Rong, Tao
In sponsored search, keyword recommendations help advertisers to achieve much better performance within limited budget. Many works have been done to mine numerous candidate keywords from search logs or landing pages. However, the strategy to select from given candidates remains to be improved. The existing relevance-based, popularity-based and regular combinatorial strategies fail to take the internal or external competitions among keywords into consideration. In this paper, we regard keyword recommendations as a combinatorial optimization problem and solve it with a modified pointer network structure. The model is trained on an actor-critic based deep reinforcement learning framework. A pre-clustering method called Equal Size K-Means is proposed to accelerate the training and testing procedure on the framework by reducing the action space. The performance of framework is evaluated both in offline and online environments, and remarkable improvements can be observed.
DECT-MULTRA: Dual-Energy CT Image Decomposition With Learned Mixed Material Models and Efficient Clustering
Li, Zhipeng, Ravishankar, Saiprasad, Long, Yong, Fessler, Jeffrey A.
Dual energy computed tomography (DECT) imaging plays an important role in advanced imaging applications due to its material decomposition capability. Image-domain decomposition operates directly on CT images using linear matrix inversion, but the decomposed material images can be severely degraded by noise and artifacts. This paper proposes a new method dubbed DECT-MULTRA for image-domain DECT material decomposition that combines conventional penalized weighted-least squares (PWLS) estimation with regularization based on a mixed union of learned transforms (MULTRA) model. Our proposed approach pre-learns a union of common-material sparsifying transforms from patches extracted from all the basis materials, and a union of cross-material sparsifying transforms from multi-material patches. The common-material transforms capture the common properties among different material images, while the cross-material transforms capture the cross-dependencies. The proposed PWLS formulation is optimized efficiently by alternating between an image update step and a sparse coding and clustering step, with both of these steps having closed-form solutions. The effectiveness of our method is validated with both XCAT phantom and clinical head data. The results demonstrate that our proposed method provides superior material image quality and decomposition accuracy compared to other competing methods.
Sparse-View X-Ray CT Reconstruction Using $\ell_1$ Prior with Learned Transform
Zheng, Xuehang, Chun, Il Yong, Li, Zhipeng, Long, Yong, Fessler, Jeffrey A.
A major challenge in X-ray computed tomography (CT) is reducing radiation dose while maintaining high quality of reconstructed images. To reduce the radiation dose, one can reduce the number of projection views (sparse-view CT); however, it becomes difficult to achieve high quality image reconstruction as the number of projection views decreases. Researchers have applied the concept of learning sparse representations from (high-quality) CT image dataset to the sparse-view CT reconstruction. We propose a new statistical CT reconstruction model that combines penalized weighted-least squares (PWLS) and $\ell_1$ regularization with learned sparsifying transform (PWLS-ST-$\ell_1$), and an algorithm for PWLS-ST-$\ell_1$. Numerical experiments for sparse-view 2D fan-beam CT and 3D axial cone-beam CT show that the $\ell_1$ regularizer significantly improves the sharpness of edges of reconstructed images compared to the CT reconstruction methods using edge-preserving regularizer and $\ell_2$ regularization with learned ST.