AITopics | Chen, Yiwei

Collaborating Authors

Chen, Yiwei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning

Chen, Yiwei, Yao, Yuguang, Zhang, Yihua, Shen, Bingquan, Liu, Gaowen, Liu, Sijia

arXiv.org Artificial IntelligenceMar-14-2025

Recent vision-language models (VLMs) have made remarkable strides in generative modeling with multimodal inputs, particularly text and images. However, their susceptibility to generating harmful content when exposed to unsafe queries raises critical safety concerns. While current alignment strategies primarily rely on supervised safety fine-tuning with curated datasets, we identify a fundamental limitation we call the "safety mirage" where supervised fine-tuning inadvertently reinforces spurious correlations between superficial textual patterns and safety responses, rather than fostering deep, intrinsic mitigation of harm. We show that these spurious correlations leave fine-tuned VLMs vulnerable even to a simple one-word modification-based attack, where substituting a single word in text queries with a spurious correlation-inducing alternative can effectively bypass safeguards. Additionally, these correlations contribute to the over prudence, causing fine-tuned VLMs to refuse benign queries unnecessarily. To address this issue, we show machine unlearning (MU) as a powerful alternative to supervised safety fine-tuning as it avoids biased feature-label mappings and directly removes harmful knowledge from VLMs while preserving their general capabilities. Extensive evaluations across safety benchmarks show that under one-word attacks, MU-based alignment reduces the attack success rate by up to 60.17% and cuts unnecessary rejections by over 84.20%. Codes are available at https://github.com/OPTML-Group/VLM-Safety-MU. WARNING: There exist AI generations that may be offensive in nature.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.11832

Country: Asia (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

The R2D2 Deep Neural Network Series for Scalable Non-Cartesian Magnetic Resonance Imaging

Chen, Yiwei, Aghabiglou, Amir, Chen, Shijie, Torki, Motahare, Tang, Chao, van Heeswijk, Ruud B., Wiaux, Yves

arXiv.org Artificial IntelligenceMar-13-2025

We introduce the R2D2 Deep Neural Network (DNN) series paradigm for fast and scalable image reconstruction from highly-accelerated non-Cartesian k-space acquisitions in Magnetic Resonance Imaging (MRI). While unrolled DNN architectures provide a robust image formation approach via data-consistency layers, embedding non-uniform fast Fourier transform operators in a DNN can become impractical to train at large scale, e.g in 2D MRI with a large number of coils, or for higher-dimensional imaging. Plug-and-play approaches that alternate a learned denoiser blind to the measurement setting with a data-consistency step are not affected by this limitation but their highly iterative nature implies slow reconstruction. To address this scalability challenge, we leverage the R2D2 paradigm that was recently introduced to enable ultra-fast reconstruction for large-scale Fourier imaging in radio astronomy. R2D2's reconstruction is formed as a series of residual images iteratively estimated as outputs of DNN modules taking the previous iteration's data residual as input. The method can be interpreted as a learned version of the Matching Pursuit algorithm. A series of R2D2 DNN modules were sequentially trained in a supervised manner on the fastMRI dataset and validated for 2D multi-coil MRI in simulation and on real data, targeting highly under-sampled radial k-space sampling. Results suggest that a series with only few DNNs achieves superior reconstruction quality over its unrolled incarnation R2D2-Net (whose training is also much less scalable), and over the state-of-the-art diffusion-based "Decomposed Diffusion Sampler" approach (also characterised by a slower reconstruction process).

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.09559

Country: Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Sensing and Signal Processing > Image Processing (0.87)

Add feedback

Retrospective Learning from Interactions

Chen, Zizhao, Gul, Mustafa Omer, Chen, Yiwei, Geng, Gloria, Wu, Anne, Artzi, Yoav

arXiv.org Artificial IntelligenceOct-17-2024

Multi-turn interactions between large language models (LLMs) and users naturally include implicit feedback signals. If an LLM responds in an unexpected way to an instruction, the user is likely to signal it by rephrasing the request, expressing frustration, or pivoting to an alternative task. Such signals are task-independent and occupy a relatively constrained subspace of language, allowing the LLM to identify them even if it fails on the actual task. This creates an avenue for continually learning from interactions without additional annotations. We introduce ReSpect, a method to learn from such signals in past interactions via retrospection. We deploy ReSpect in a new multimodal interaction scenario, where humans instruct an LLM to solve an abstract reasoning task with a combinatorial solution space. Through thousands of interactions with humans, we show how ReSpect gradually improves task completion rate from 31% to 82%, all without any external annotation.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.13852

Country:

Asia (1.00)
Europe (0.92)
North America > United States (0.68)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Scalable Non-Cartesian Magnetic Resonance Imaging with R2D2

Chen, Yiwei, Tang, Chao, Aghabiglou, Amir, Chu, Chung San, Wiaux, Yves

arXiv.org Artificial IntelligenceMay-28-2024

We propose a new approach for non-Cartesian magnetic resonance image reconstruction. While unrolled architectures provide robustness via data-consistency layers, embedding measurement operators in Deep Neural Network (DNN) can become impractical at large scale. Alternative Plug-and-Play (PnP) approaches, where the denoising DNNs are blind to the measurement setting, are not affected by this limitation and have also proven effective, but their highly iterative nature also affects scalability. To address this scalability challenge, we leverage the "Residual-to-Residual DNN series for high-Dynamic range imaging (R2D2)" approach recently introduced in astronomical imaging. R2D2's reconstruction is formed as a series of residual images, iteratively estimated as outputs of DNNs taking the previous iteration's image estimate and associated data residual as inputs. The method can be interpreted as a learned version of the Matching Pursuit algorithm. We demonstrate R2D2 in simulation, considering radial k-space sampling acquisition sequences. Our preliminary results suggest that R2D2 achieves: (i) suboptimal performance compared to its unrolled incarnation R2D2-Net, which is however non-scalable due to the necessary embedding of NUFFT-based data-consistency layers; (ii) superior reconstruction quality to a scalable version of R2D2-Net embedding an FFT-based approximation for data consistency; (iii) superior reconstruction quality to PnP, while only requiring few iterations.

artificial intelligence, machine learning, r2d2-net, (15 more...)

arXiv.org Artificial Intelligence

2403.17905

Country: Europe > United Kingdom (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Exploring Boundary of GPT-4V on Marine Analysis: A Preliminary Case Study

Zheng, Ziqiang, Chen, Yiwei, Zhang, Jipeng, Vu, Tuan-Anh, Zeng, Huimin, Tim, Yue Him Wong, Yeung, Sai-Kit

arXiv.org Artificial IntelligenceJan-4-2024

Large language models (LLMs) have demonstrated a powerful ability to answer various queries as a general-purpose assistant. The continuous multi-modal large language models (MLLM) empower LLMs with the ability to perceive visual signals. The launch of GPT-4 (Generative Pre-trained Transformers) has generated significant interest in the research communities. GPT-4V(ison) has demonstrated significant power in both academia and industry fields, as a focal point in a new artificial intelligence generation. Though significant success was achieved by GPT-4V, exploring MLLMs in domain-specific analysis (e.g., marine analysis) that required domain-specific knowledge and expertise has gained less attention. In this study, we carry out the preliminary and comprehensive case study of utilizing GPT-4V for marine analysis. This report conducts a systematic evaluation of existing GPT-4V, assessing the performance of GPT-4V on marine research and also setting a new standard for future developments in MLLMs. The experimental results of GPT-4V show that the responses generated by GPT-4V are still far away from satisfying the domain-specific requirements of the marine professions. All images and prompts used in this study will be available at https://github.com/hkust-vgd/Marine_GPT-4V_Eval

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2401.02147

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Single-photon Image Super-resolution via Self-supervised Learning

Chen, Yiwei, Jiang, Chen, Pan, Yu

arXiv.org Artificial IntelligenceMar-3-2023

Single-Photon Image Super-Resolution (SPISR) aims to recover a high-resolution volumetric photon counting cube from a noisy low-resolution one by computational imaging algorithms. In real-world scenarios, pairs of training samples are often expensive or impossible to obtain. By extending Equivariant Imaging (EI) to volumetric single-photon data, we propose a self-supervised learning framework for the SPISR task. Particularly, using the Poisson unbiased Kullback-Leibler risk estimator and equivariance, our method is able to learn from noisy measurements without ground truths. Comprehensive experiments on simulated and real-world dataset demonstrate that the proposed method achieves comparable performance with supervised learning and outperforms interpolation-based methods.

artificial intelligence, imaging, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2303.02033

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.83)

Add feedback

Quantum Language Model with Entanglement Embedding for Question Answering

Chen, Yiwei, Pan, Yu, Dong, Daoyi

arXiv.org Artificial IntelligenceAug-22-2020

Quantum Language Models (QLMs) in which words are modelled as quantum superposition of sememes have demonstrated a high level of model transparency and good post-hoc interpretability. Nevertheless, in the current literature word sequences are basically modelled as a classical mixture of word states, which cannot fully exploit the potential of a quantum probabilistic description. A full quantum model is yet to be developed to explicitly capture the non-classical correlations within the word sequences. We propose a neural network model with a novel Entanglement Embedding (EE) module, whose function is to transform the word sequences into entangled pure states of many-body quantum systems. Strong quantum entanglement, which is the central concept of quantum information and an indication of parallelized correlations among the words, is observed within the word sequences. Numerical experiments show that the proposed QLM with EE (QLM-EE) achieves superior performance compared with the classical deep neural network models and other QLMs on Question Answering (QA) datasets. In addition, the post-hoc interpretability of the model can be improved by quantizing the degree of entanglement among the words.

deep learning, entanglement, neural network, (23 more...)

arXiv.org Artificial Intelligence

2008.09943

Country: Oceania > Australia (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback