AITopics | Zhu, Yingying

Collaborating Authors

Zhu, Yingying

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Computation-Efficient and Recognition-Friendly 3D Point Cloud Privacy Protection

Ma, Haotian, Gu, Lin, Wu, Siyi, Zhu, Yingying

arXiv.org Artificial IntelligenceMar-19-2025

3D point cloud has been widely used in applications such as self-driving cars, robotics, CAD models, etc. To the best of our knowledge, these applications raised the issue of privacy leakage in 3D point clouds, which has not been studied well. Different from the 2D image privacy, which is related to texture and 2D geometric structure, the 3D point cloud is texture-less and only relevant to 3D geometric structure. In this work, we defined the 3D point cloud privacy problem and proposed an efficient privacy-preserving framework named PointFlowGMM that can support downstream classification and segmentation tasks without seeing the original data. Using a flow-based generative model, the point cloud is projected into a latent Gaussian mixture distributed subspace. We further designed a novel angular similarity loss to obfuscate the original geometric structure and reduce the model size from 767MB to 120MB without a decrease in recognition performance. The projected point cloud in the latent space is orthogonally rotated randomly to further protect the original geometric structure, the class-to-class relationship is preserved after rotation, thus, the protected point cloud can support the recognition task. We evaluated our model on multiple datasets and achieved comparable recognition results on encrypted point clouds compared to the original point clouds.

point cloud privacy protection, ptex

arXiv.org Artificial Intelligence

2503.15818

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.53)

Add feedback

Retrieval-guided Cross-view Image Synthesis

Yang, Hongji, Li, Yiru, Zhu, Yingying

arXiv.org Artificial IntelligenceNov-29-2024

Cross-view image synthesis involves generating new images of a scene from different viewpoints or perspectives, given one input image from other viewpoints. Despite recent advancements, there are several limitations in existing methods: 1) reliance on additional data such as semantic segmentation maps or preprocessing modules to bridge the domain gap; 2) insufficient focus on view-specific semantics, leading to compromised image quality and realism; and 3) a lack of diverse datasets representing complex urban environments. To tackle these challenges, we propose: 1) a novel retrieval-guided framework that employs a retrieval network as an embedder to address the domain gap; 2) an innovative generator that enhances semantic consistency and diversity specific to the target view to improve image quality and realism; and 3) a new dataset, VIGOR-GEN, providing diverse cross-view image pairs in urban settings to enrich dataset diversity. Extensive experiments on well-known CVUSA, CVACT, and new VIGOR-GEN datasets demonstrate that our method generates images of superior realism, significantly outperforming current leading approaches, particularly in SSIM and FID evaluations. Cross-view image synthesis aims to generate images from a new perspective or viewpoint that differs from the original image, which synthesizes images from a given view (e.g., aerial or bird's eye view) to a target view (e.g., street or ground view), even when the target viewpoint was not originally captured. It offers a wide range of applications, such as autonomous driving, robot navigation, 3D reconstruction Mahmud et al. (2020), virtual/augmented reality Bischke et al. (2016), urban planning In this paper, we probe into the ground-to-aerial / aerial-to-ground view synthesis based on a given source-view image (as illustrated in the upper half of Figure 1). This task presents significant challenges, as it requires the model to comprehend and interpret the scene's geometry and object appearances from one view, and then reconstruct or generate a realistic image from a different viewpoint. While promising, several key challenges plague existing cross-view image synthesis methods. Existing methods often rely on extra information like semantic segmentation maps Regmi & Borji (2018); Tang et al. (2019); Wu et al. (2022) or preprocessing modules like polar-transformation Lu et al. (2020); Toker et al. (2021); Shi et al. (2022) to bridge the domain gap between different views.

artificial intelligence, machine learning, synthesis, (12 more...)

arXiv.org Artificial Intelligence

2411.1951

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.35)
Transportation > Ground (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models

Deng, Linger, Liu, Yuliang, Li, Bohan, Luo, Dongliang, Wu, Liang, Zhang, Chengquan, Lyu, Pengyuan, Zhang, Ziyang, Zhang, Gang, Ding, Errui, Zhu, Yingying, Bai, Xiang

arXiv.org Artificial IntelligenceOct-27-2024

Existing Large Multimodal Models (LMMs) struggle with mathematical geometric reasoning due to a lack of high-quality image-text paired data. Current geometric data generation approaches, which apply preset templates to generate geometric data or use Large Language Models (LLMs) to rephrase questions and answers (Q&A), unavoidably limit data accuracy and diversity. To synthesize higherquality data, we propose a two-stage Reverse Chain-of-Thought (R-CoT) geometry problem generation pipeline. First, we introduce GeoChain to produce highfidelity geometric images and corresponding descriptions highlighting relations among geometric elements. We then design a Reverse A&Q method that reasons step-by-step based on the descriptions and generates questions in reverse from the reasoning results. Experiments demonstrate that the proposed method brings significant and consistent improvements on multiple LMM baselines, achieving new performance records in the 2B, 7B, and 8B settings. Notably, R-CoT-8B significantly outperforms previous state-of-the-art open-source mathematical models by 16.6% on MathVista and 9.2% on GeoQA, while also surpassing the closedsource model GPT-4o by an average of 13% across both datasets. The code is available at https://github.com/dle666/R-CoT. Large Language Models (LLMs) exhibit excellent reasoning capabilities and draw extensive attention from the artificial intelligence research community (Lu et al., 2023b) to mathematical problemsolving in textual form (Chen et al., 2024b; Liao et al., 2024; Zhou et al., 2024; Zhao et al., 2024b; Zhou & Zhao, 2024; Kim et al., 2024). However, LLMs still struggle to solve mathematical problems involving images that require visual comprehension. Geometry problems, as typical mathematical problems with images, play an important role in evaluating mathematical reasoning skills (Zhang et al., 2023c), requiring a high level of visual comprehension. Besides, even though some problems are not related to geometry on the surface, they require the same skills for models (e.g., fine-grained image comprehension skills and multi-step reasoning skills). With the appearance of o1 (OpenAI, 2024), GPT-4o (Islam & Moushi, 2024), Gemini (Team et al., 2023), and numerous Large Multimodal Models (LMMs) (Li et al., 2024a; Liu et al., 2024; Chen et al., 2024d; Bai et al., 2023), recent researches progressively investigate using LMMs to solve mathematical geometry problems. Although LMMs show impressive results in general visual question-answering (VQA) tasks (Fan et al., 2024; Liu et al., 2024), they still face challenges in solving mathematical geometry problems. Adjust values in the question and generate answers.

equilateral triangle, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.17885

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering

Hu, Xinyue, Gu, Lin, An, Qiyuan, Zhang, Mengliang, Liu, Liangchen, Kobayashi, Kazuma, Harada, Tatsuya, Summers, Ronald M., Zhu, Yingying

arXiv.org Artificial IntelligenceJul-22-2023

To contribute to automating the medical vision-language model, we propose a novel Chest-Xray Difference Visual Question Answering (VQA) task. Given a pair of main and reference images, this task attempts to answer several questions on both diseases and, more importantly, the differences between them. This is consistent with the radiologist's diagnosis practice that compares the current image with the reference before concluding the report. We collect a new dataset, namely MIMIC-Diff-VQA, including 700,703 QA pairs from 164,324 pairs of main and reference images. Compared to existing medical VQA datasets, our questions are tailored to the Assessment-Diagnosis-Intervention-Evaluation treatment procedure used by clinical professionals. Meanwhile, we also propose a novel expert knowledge-aware graph representation learning model to address this task. The proposed baseline model leverages expert knowledge such as anatomical structure prior, semantic, and spatial knowledge to construct a multi-relationship graph, representing the image differences between two images for the image difference VQA task. The dataset and code can be found at https://github.com/Holipori/MIMIC-Diff-VQA. We believe this work would further push forward the medical vision language model.

machine learning, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3580305.3599819

2307.11986

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-modal, multi-task, multi-attention (M3) deep learning detection of reticular pseudodrusen: towards automated and accessible classification of age-related macular degeneration

Chen, Qingyu, Keenan, Tiarnan D. L., Allot, Alexis, Peng, Yifan, Agrón, Elvira, Domalpally, Amitha, Klaver, Caroline C. W., Luttikhuizen, Daniel T., Colyer, Marcus H., Cukras, Catherine A., Wiley, Henry E., Magone, M. Teresa, Cousineau-Krieger, Chantal, Wong, Wai T., Zhu, Yingying, Chew, Emily Y., Lu, Zhiyong

arXiv.org Artificial IntelligenceNov-11-2020

Objective Reticular pseudodrusen (RPD), a key feature of age-related macular degeneration (AMD), are poorly detected by human experts on standard color fundus photography (CFP) and typically require advanced imaging modalities such as fundus autofluorescence (FAF). The objective was to develop and evaluate the performance of a novel'M3' deep learning framework on RPD detection. Materials and Methods A deep learning framework M3 was developed to detect RPD presence accurately using CFP alone, FAF alone, or both, employing 8000 CFP-FAF image pairs obtained prospectively (Age-Related Eye Disease Study 2). The M3 framework includes multi-modal (detection from single or multiple image modalities), multi-task (training different tasks simultaneously to improve generalizability), and multi-attention (improving ensembled feature representation) operation. Performance on RPD detection was compared with state-of-the-art deep learning models and 13 ophthalmologists; performance on detection of two other AMD features (geographic atrophy and pigmentary abnormalities) was also evaluated. Results For RPD detection, M3 achieved area under receiver operating characteristic (AUROC) 0.832, 0.931, and 0.933 for CFP alone, FAF alone, and both, respectively. M3 performance on CFP was very substantially superior to human retinal specialists (median F1-score 0.644 versus 0.350). External validation (on Rotterdam Study, Netherlands) demonstrated high accuracy on CFP alone (AUROC 0.965). The M3 framework also accurately detected geographic atrophy and pigmentary abnormalities (AUROC 0.909 and 0.912, respectively), demonstrating its generalizability. Conclusion This study demonstrates the successful development, robust evaluation, and external validation of a novel deep learning framework that enables accessible, accurate, and automated AMD diagnosis and prognosis. INTRODUCTION Age-related macular degeneration (AMD) is the leading cause of legal blindness in developed countries [1 2]. Late AMD is the stage with the potential for severe visual loss; it takes two forms, geographic atrophy and neovascular AMD. AMD is traditionally diagnosed and classified using color fundus photography (CFP) [3], the most widely used and accessible imaging modality in ophthalmology. In the absence of late disease, two main features (macular drusen and pigmentary abnormalities) are used to classify disease and stratify risk of progression to late AMD [3]. More recently, additional imaging modalities have become available in specialist centers, particularly fundus autofluorescence (FAF) imaging [4 5]. Following these developments in retinal imaging, a third macular feature (reticular pseudodrusen, RPD) is now recognized as a key AMD lesion [6 7].

deep learning, neural network, scenario, (20 more...)

arXiv.org Artificial Intelligence

2011.05142

Country:

Europe > Netherlands > South Holland > Rotterdam (0.25)
North America > United States > Maryland > Montgomery County > Bethesda (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)
Research Report > New Finding (0.94)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Probabilistic Disease Progression Model for Predicting Future Clinical Outcome

Zhu, Yingying, Sabuncu, Mert R.

arXiv.org Machine LearningMar-13-2018

In this work, we consider the problem of predicting the course of a progressive disease, such as cancer or Alzheimer's. Progressive diseases often start with mild symptoms that might precede a diagnosis, and each patient follows their own trajectory. Patient trajectories exhibit wild variability, which can be associated with many factors such as genotype, age, or sex. An additional layer of complexity is that, in real life, the amount and type of data available for each patient can differ significantly. For example, for one patient we might have no prior history, whereas for another patient we might have detailed clinical assessments obtained at multiple prior time-points. This paper presents a probabilistic model that can handle multiple modalities (including images and clinical assessments) and variable patient histories with irregular timings and missing entries, to predict clinical scores at future time-points. We use a sigmoidal function to model latent disease progression, which gives rise to clinical observations in our generative model. We implemented an approximate Bayesian inference strategy on the proposed model to estimate the parameters on data from a large population of subjects. Furthermore, the Bayesian framework enables the model to automatically fine-tune its predictions based on historical observations that might be available on the test subject. We applied our method to a longitudinal Alzheimer's disease dataset with more than 3000 subjects [23] and present a detailed empirical analysis of prediction performance under different scenarios, with comparisons against several benchmarks. We also demonstrate how the proposed model can be interrogated to glean insights about temporal dynamics in Alzheimer's disease.

deep learning, neural network, target variable, (21 more...)

arXiv.org Machine Learning

1803.05011

Country: North America > United States (0.68)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback